For decades experienced Unix users have employed many text processing tools to make document editing tasks much easier. Console utilities such as sed, awk, cut, paste, and join, though useful in isolation, only realise their full potential when combined together through the use of pipes.
Recently Linux has been used for more than just processing of ASCII text. The growing popularity of various multimedia formats, in the form of images and audio data, has spurred on the development of tools to deal with such files. Many of these tools have graphical user interfaces and cannot operate in absence of user interaction. There are, however, a growing number of tools which can be operated in batch mode with their interfaces disabled. Some tools are even designed to be used from the command prompt or within shell scripts.
It is this class of tools that this article will explore. Complex media manipulation functions can often be effected by combining simple tools together using techniques normally applied to text processing filters. The focus will be on audio stream processing as these formats work particularly well with the Unix filter pipeline paradigm.
There are a multitude of sound file formats and converting between
them is a frequent operation. The sound exchange utility sox
fulfills this role and is invoked at the command prompt:
sox sample.wav sample.aiff
The above command will convert a WAV file to AIFF format. One can
also change the sample rate, bits per sample (8 or 16), and number of
channels:
sox sample.aiff -r 8000 -b -c 1 low.aiff
low.aiff will be at 8000 single byte samples per second in a
single channel.
sox sample.aiff -r 44100 -w -c 2 high.aiff
high.aiff will be at 44100 16-bit samples per second in stereo.
When sox cannot guess the destination format from the file
extension it is necessary to specify this explicitly:
sox sample.wav -t aiff sample.000
The "-t raw" option indicates a special headerless format that
contains only raw sample data:
sox sample.wav -t raw -r 11025 -sw -c 2 sample.000
As the file has no header specifying the sample rate, bits per sample,
channels etc, it is a good idea to set these explicitly at the command
line. This is necessary when converting from the raw format:
sox -t raw -r 11025 -sw -c 2 sample.000 sample.aiff
One need not use the "-t raw" option if the file
extension is .raw, however this option is essential when the
raw samples are coming from standard input or being sent to standard
output. To do this, use the "-" in place of the
file name:
sox -t raw -r 11025 -sw -c 2 - sample.aiff < sample.raw
sox sample.aiff -t raw -r 11025 -sw -c 2 - > sample.raw
Why would we want to do this? This usage style allows sox to
be used as a filter in a command pipeline.
sox sample.aiff -t raw -r 44100 -sw -c 2 - | sox -t raw -r 32000 -sw -c 2 - slow.aiff
sox sample.aiff -t raw -r 32000 -sw -c 2 - | sox -t raw -r 44100 -sw -c 2 - fast.aiff
sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 352800 | sox -t raw -r 44100 -sw -c 2 - twosecs.aiff
Likewise to extract the last second of a sample:
sox sample.aiff -t raw -r 44100 -sw -c 2 - | tail -c 176400 |
sox -t raw -r 44100 -sw -c 2 - lastsec.aiff
and the third second:
sox sample.aiff -t raw -r 44100 -sw -c 2 - | tail -c +352801 |
head -c 176400 | sox -t raw -r 44100 -sw -c 2 - lastsec.aiff
Note that with 16-bit samples the argument to "tail -c
+N" must be odd, otherwise the raw samples become
misaligned.
One can extract parts of different samples and join them together into
one file via nested sub-shell commands:
(sox sample-1.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400
sox sample-2.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 ) |
sox -t raw -r 44100 -sw -c 2 - newsample.aiff
Here we invoke a child shell that outputs raw samples to standard
output from two different files. This is piped to a sox
process executing in the parent shell which creates the resulting
file.
Sounds can be sent to the OSS (open sound system) device /dev/dsp
with the "-t ossdsp" option:
sox sample.aiff -t ossdsp /dev/dsp
The sox package usually includes a platform-independent
script play that invokes sox with the appropriate
options. The previous command could be invoked simply by
play sample.aiff
Audio samples played this way monopolise the output hardware. Another sound capable application must wait until the audio device is freed before attempting to play more samples. Desktop environments such as GNOME and KDE provide facilities to play more than one audio sample simultaneously. Samples may be issued by different applications at any time without having to wait, although not every audio application knows how to do this for each of the various desktops. sox is one such program that lacks this capability. However, with a little investigation of the audio media services provided by GNOME and KDE, one can devise ways to overcome this shortcoming.
There are quite a few packages that allow audio device sharing. One common strategy is to run a background server to which client applications must send their samples to be played. The server then grabs control of the sound device and forwards the audio data to it. Should more than one client send samples at the same time the server mixes them together and sends a single combined stream to the output device.
The Enlightened Sound Daemon (ESD) uses this method. The server, esd, can often be found running in the background of GNOME desktops. The ESD package goes by the name, esound, on most distributions and includes a few simple client applications such as:
sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 | esdcat
sox sample.cdr -t raw -r 44100 -sw -c 2 - | esdcat
The Analog RealTime Synthesizer (ARtS) is similar to ESD but is often used
with KDE. The background server is artsd with the
corresponding client programs, artsplay and artscat.
To play a sample:
sox sample.cdr -t raw -r 44100 -sw -c 2 - | tail -c 352800 |artscat
Both ESD and ARtS are not dependent on any one particular desktop environment. With some work, one could in theory use ESD with KDE and ARtS with GNOME. Each can even be used within a console login session. Thus one can mix samples, encoded in a plethora of formats, with or without the graphical desktop interface.
Having covered what goes on the end of an audio pipeline, we should consider what can be placed at the start. Sometimes one would like to manipulate samples extracted from music files in MP3, MIDI, or module (MOD, XM, S3M, etc) format. Command line tools exist for each of these formats that will output raw samples to standard output.
For MP3 music one can use "maplay -s"
maplay -s music.mp3 | artscat
The music.mp3 must be encoded at 44.1kHz stereo to play
properly otherwise artscat or esdcat will have to be
told otherwise:
maplay -s mono22khz.mp3 | esdcat -r 22050 -m
maplay -s mono22khz.mp3 | artscat -r 22050 -c 1
Alternatively one can use "mpg123 -s". Additional
arguments ensure that the output is at the required rate and number of
channels:
mpg123 -s -r 44100 --stereo lowfi.mp3 | artscat
Users of Ogg Vorbis may use the following:
ogg123 -d raw -f - music.ogg | artscat
Piping is not really necessary here since ogg123 has built-in
ESD and ARtS output drivers. Nevertheless, it is still useful to have
access to a raw stream of sample data which one can feed through a
pipeline.
Music files also can be obtained in MIDI format. If (like me) you
have an old sound card with poor sequencer hardware, you may find that
timidity can work wonders. Normally this package converts
MIDI files into sound samples for direct output to the sound device.
Carefully chosen command line options can redirect this output:
timidity -Or1sl -o - -s 44100 music.mid | artscat
The "-o -" sends sample data to standard
output, "-Or1sl" ensures that the samples
are 16-bit signed format, and "-s 44100"
sets the sample rate appropriately.
If you're a fan of the demo scene you might want to play a few music
modules on your desktop. Fortunately mikmod can play most of
the common module formats. The application can also output directly
to the sound device or via ESD. The current stable version of
libmikmod, 3.1.9, does not seem to be ARtS aware yet. One can
remedy this using a command pipeline:
mikmod -d stdout -q -f 44100 music.mod | artscat
The -q is needed to turn off the curses interface
which also uses standard output. If you still want access to this
interface you should try the following:
mikmod -d pipe,pipe=artscat -f 44100 music.mod
Only the later versions of mikmod know how to create their
own output pipelines.
play sample.aiff echo 1 0.6 150 0.6
play sample.aiff vibro 20 0.9
play sample.aiff flanger 0.7 0.7 4 0.8 2 play sample.aiff phaser 0.6 0.6 4 0.6 2
play sample.aiff band 3000 700
play sample.aiff band 0 700
play sample.aiff chorus 0.7 0.7 20 1 5 2 -s
play sample.aiff reverse
mikmod -d stdout -q -f 44100 music.xm | sox -t raw -r 44100 -sw -c 2 - -t raw - chorus 0.7 0.7 80 0.5 2 1 -s | artscat
ogg123 -d raw -f - music.ogg | tail -c +705601 |artscat
timidity -Or1sl -o - -s 44100 music.mid | sox -t raw -r 44100 -sw -c 2 - -t raw - echo 1 0.6 80 0.6 | oggenc -o music.ogg --raw -
maplay -s mono32.mp3 | sox -v 0.5 -t raw -r 32000 -sw -c 1 - -t raw -r 44100 -c 2 - split | oggenc -o music.ogg --raw -
for x in *.aiff do sox $x -v 0.5 -t raw -r 8000 -bu -c 1 - done | sox -t raw -r 8000 -bu -c 1 - all.wav
Hopefully these examples hint at what can be accomplished with the pipeline technique. One cannot argue against using interactive applications with elaborate graphical user interfaces. They often can perform much more complicated tasks while saving the user from having to memorise pages of argument flags. There will always be instances where command pipelines are more suitable however. Converting a large number of sound samples will require some form of scripting. Interactive programs cannot be invoked as part of an at or cron job.
Audio pipelines can also be used to save disk space. One need not store a dozen copies of what is essentially the same sample with different modifications applied. Instead, create a dozen scripts each with a different pipeline of filters. These can be invoked when the modified version of the sound sample is called for. The altered sound is generated on demand.
I encourage you to experiment with the tools described in this
article. Try combining them together in increasingly elaborate
sequences. Most importantly, remember to have fun while
doing so.
Adrian J Chung
When not teaching undergraduate computing at the University of the West
Indies, Trinidad, Adrian is writing system level scripts to manage a network
of Linux boxes, and conducts experiments with interfacing various scripting
environments with home-brew computer graphics renderers and data visualization
libraries.
Copyright © 2001, Adrian J. Chung.
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 73 of Linux Gazette, December 2001