...making Linux just a little more fun!

<-- prev | next -->

Introduction to Shell Scripting, part 4

By Ben Okopnik

A Blast from the Past! Originally published in Issue 55 of Linux Gazette, May 2000

There are two major products that come out of Berkeley: LSD and UNIX. We
don't believe this to be a coincidence.
 -- Jeremy Anderson

The Deep, Dark Secrets of Bash

Deep within the "bash" man page lurk terrible things, not to be approached by the timid or the inexperienced... Beware, Pilgrim: the last incautious spelunker into these mysterious regions was found, weeks later, muttering some sort of strange incantations that sounded like "nullglob", "dotglob", and MAILPATH='/usr/spool/mail/bfox?"You have mail":~/shell-mail?"$_ has mail!"'

(He was immediately hired by an Unnamed Company in Silicon Valley for an unstated (but huge) salary... but that's beside the point.)

<Shrug> What the heck; I've already gone parasailing and scuba-diving this month (and will shortly be taking off on a 500-mile sail up the Gulf Stream); let's keep living La Vida Loca!

Parameter Expansion

The built-in parsing capabilities of "bash" are rather minimal as compared to, say, Perl or AWK; in my best estimate, they're not intended for serious processing, just "quick and dirty" minor-task handling. Nevertheless, they can be very handy for that purpose.

As an example, let's say that you need to differentiate between lowercase and capitalized filenames in processing a directory - I ended up doing that with my backgrounds for X, since some of them look best tiled, and others stretched to full-screen size (file size wasn't quite a good-enough guide). I "capped" all the names of the full-sized pics, and "decapped" all the tiles. Then, as part of my random background selector, "bkgr", I wrote the following:

...

# We need _just_ the filename
fn=$(basename $fnm)

# Set the "-max" switch if the test returns 'true'
[ -z "${fn##[A-Z]*}" ] && MAX="-max"

# Run "xv" with|without the "-max" option based on the test result
xv -root -quit $MAX $fnm &

...

Confusing-looking stuff, isn't it? Well, part of it we already know: the [ -z ... ] is a test for a zero-length string. What about the other part, though?

In order to 'protect' our parameter expansion result from the cold, cruel world (e.g., if you wanted to use the result as part of a filename, you'd need the 'protection' to keep it separate from the trailing characters), we use curly brackets to surround the whole enchilada -

$d is the same as${d} - except that the second variety can be combined with other things without losing its identity, like so:

...
d=Digit

# "Digitize"
echo ${d}ize

# "Digital"
echo ${d}al

# "Digits"
echo ${d}s

# "Digitalis"
echo ${d}alis
...

Now that we have it isolated from the world, friendless and all alone... oops, sorry - that's "shell script", not "horror movie script" - I lose track once in a while... Anyway, now that we've separated the variable out via the curly braces, we can apply a few tools incorporated in Bash to perform some basic parsing. Note that I'm going to show the result after each 'echo' statement as if that statement had been executed.

### Let's pick a nice, longish word to play with.
var="amanuensis"

### ${#var} - return length
echo ${#var}
10

### ${var#word} - cut shortest match from start
echo ${var#*n}
uensis

### ${var##word} - cut longest match from start
echo ${var##*n}
sis

### ${var%word} - cut shortest match from end
echo ${var%n*}
amanue

### ${var%%word} - cut longest match from end
echo ${var%%n*}
ama

### ${var:offset} - return string starting at 'offset'
echo ${var:7}
sis

### ${var:offset:length} - return 'length' characters starting at 'offset'
echo ${var:1:3}
man

### ${var/pattern/string} - replace single match
echo ${var/amanuen/paralip}
paralipsis

### ${var//pattern/string} - replace all matches
echo ${var//a/A}
AmAnuensis

(For the last two operations, if the pattern begins with #, it will match at the beginning of the string; if it begins with %, it will match at the end. If the string is empty, matches will be deleted.)

There's actually a bit more to it - things like variable indirection, and parsing arrays - but, gee, I guess you'll just have to study that man page yourself. <grin> Just consider this as motivational material.

So, now that we've looked at the tools, let's look back at the code -[ -z "${fn##[A-Z]*}" ] Not all that difficult anymore, is it? Or maybe it is; my thought process, in dealing with searches and matches, tends to resemble pretzel-bending. What I did here - and it could be done in a number of other ways, given the above tools - is match for a max-length string (i.e., the entire filename) that begins with an uppercase character. The [ -z ... ] returns 'true' if the resulting string is zero-length (i.e., matches the [A-Z]* pattern), and $MAX is set to "-max".

Note that, since we're matching the entire string, ${fn%%[A-Z]*} would work just as well. If that seems confusing - if all of the above seems confusing - I suggest lots and lots of experimentation to familiarize yourself with it. It's easy: set a parameter value, and experiment, like so -

Odin:~$ experiment=supercallifragilisticexpialadocious
Odin:~$ echo ${experiment%l*}
supercallifragilisticexpia
Odin:~$ echo ${experiment%%l*}
superca
Odin:~$ echo ${experiment#*l}
lifragilisticexpialadocious
Odin:~$ echo ${experiment##*l}
adocious
...and so on. It's the best way to get a feel for what a certain tool does; pick it up, plug it in, put on your safety glasses and gently squuueeeze the trigger. Observe all safety precautions as random deletion of valuable data may occur. Actual results may vary and will often surprise you.

Parameter State

There are times - say, in testing for a range of error conditions that set different variables - when we need to know whether a specific variable is set (has been assigned a value) or not. True, we could test it for length, as I did above, but the utilities provided by "bash" for the purpose provide convenient shortcuts for such occasions.
joe=

### ${parameter:-word} - If parameter is unset, substitute "word"
echo ${joe:-mary}
mary
echo $joe

### ${parameter:?word} - Display "word" or error if parameter is unset
echo ${joe:?"Not set"}
bash: joe: Not set
echo ${joe:?}
bash: joe: parameter null or not set

### ${parameter:=word} - If parameter is unset, set it to "word"
echo ${joe:=mary}
mary
echo $joe
mary

### ${parameter:+word} - "word" is substituted only if parameter is set
echo ${joe:+blahblah}
blahblah

Array Handling

Another built-in capability of "bash", a basic mechanism for handling arrays, allows us to process data that needs to be indexed, or at least kept in a structure that allows individual addressing of each of its members. Consider the following scenario: if I have a phonebook/address list, and want to send my latest "Sailor's Newsletter" to everyone in the "Friends" category, how do I do it? Furthermore, say that I also want to create a list of names of the people I sent it to, or do some other processing... i.e., it becomes necessary to split it up into fields by length - and arrays become one of the very few viable options.

Let's look at what this might involve. Here's a clip of a notional phonebook to be used for the job:

Name Category Address Email
Jim & Fanny Friends Business 101101 Digital Dr. LA CA [email protected]
Fred & Wilma Rocks friends 12 Cave St. Granite, CT [email protected]
Joe 'Da Fingers' Lucci Business 45 Caliber Av. B-klyn NY [email protected]
Yoda Leahy-Hu Friend 1 Peak Fribourg Switz [email protected]
Cyndi, Wendi, & Myndi Friends 5-X Rated St. Holiday FL [email protected]

Whew. This stuff obviously needs to be read in by fields - word counting won't do; neither will a text search. Arrays to the rescue!

#!/bin/bash
# 'nlmail' sends the monthly newsletter to friends listed
# in the phonebook

# "bash" would create the arrays automatically, since we'll
# use the 'name[subscript]' syntax to load the variables -
# but I happen to like explicit declarations.

declare -a name category address email

# A little Deep Hackery to make the 'for' loop read a line at a time
OLD=$IFS
IFS='
'
for line in $(cat phonelist)
do
    # Increment the line counter
    ((x++))

    # Load up the 'name' variable
    name[$x]="${line:0:25}"
    
    # etc. for 'category',
    category[$x]="${line:25:10}"
    
    # etc. for 'address',
    address[$x]="${line:35:25}"
    
    # etc. for 'email'...
    email[$x]="${line:60:20}"
done

# Undo the line-parsing magic
IFS=$OLD

...
At this point, we have the "phonelist" file loaded into the four arrays that we've created, ready for further processing. Each of the fields is easily addressable, thus making the stated problem - that of e-mailing a given file to all my friends - a trivial one (this snippet is a continuation of the previous script):
...
for y in $(seq $x)
do
    # We'll look for the word "friend" in the 'category' field,
    # make it "case-blind", and clip any trailing characters.
    if [ -z "${category[$y]##[Ff]riend*}" ]
    then
        mutt -a Newsletter.pdf -s 'S/V Ulysses News, 6/2000' ${email[$y]}
        echo "Mail sent to ${name[$y]}" >> sent_list.txt
    fi
done
That should do it, as well as pasting the recipients' names into a file called "sent_list.txt" - a nice double-check feature that lets me see if I missed anyone.

The array processing capabilities of "bash" extend a bit beyond this simple example. Suffice it to say that for simple cases of this sort, with files under, say, a couple of hundred kB, "bash" arrays are the way to go.

Note that the above script can be easily generalized - as an example, you could add the ability to specify different phone-lists, criteria, or actions, right from the command line. Once the data is broken up into an easily-addressable format, the possibilities are endless...

Wrapping It Up

Bash, besides being very capable in its role as a command-line interpreter/shell, boasts a large number of rather sophisticated tools available to anyone who needs to create custom programs. In my opinion, shell scripting suits its niche - that of a simple yet powerful programming language - perfectly, fitting between command-line utility usage and full-blown (C, Tcl/Tk, Perl, Python) programming, and should be part of every *nix user's arsenal. Linux encourages the "do it yourself" attitude among its users by giving them access to powerful tools and the means to automate their usage - something that I consider a tighter integration (and that much higher a "usability quotient") between the underlying power of the OS and the user environment.

Until next month -
Happy Linuxing!


Quote Of The Month:
"...Yet terrible as UNIX addiction is, there are worse fates. If UNIX is
the heroin of operating systems, then VMS is barbiturate addiction, the Mac
is MDMA, and MS-DOS is sniffing glue. (Windows is filling your sinuses with
lucite and letting it set.)

You owe the Oracle a twelve-step program."
 -- the Usenet Oracle

References

The "man" pages for 'bash', 'builtins', 'sed', 'mutt'

"Introduction to Shell Scripting - The Basics" by Ben Okopnik, LG #53 "Introduction to Shell Scripting" by Ben Okopnik, LG #54 "Introduction to Shell Scripting" by Ben Okopnik, LG #55 "Introduction to Shell Scripting" by Ben Okopnik, LG #56

 


picture Ben is the Editor-in-Chief for Linux Gazette and a member of The Answer Gang.

Ben was born in Moscow, Russia in 1962. He became interested in electricity at the tender age of six, promptly demonstrated it by sticking a fork into a socket and starting a fire, and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory. He would gladly pay good money to any psychologist who can cure him of the recurrent nightmares.

His subsequent experiences include creating software in nearly a dozen languages, network and database maintenance during the approach of a hurricane, and writing articles for publications ranging from sailing magazines to technological journals. After a seven-year Atlantic/Caribbean cruise under sail and passages up and down the East coast of the US, he is currently anchored in St. Augustine, Florida. He works as a technical instructor for Sun Microsystems and a private Open Source consultant/Web developer. His current set of hobbies includes flying, yoga, martial arts, motorcycles, writing, and Roman history; his Palm Pilot is crammed full of alarms, many of which contain exclamation points.

He has been working with Linux since 1997, and credits it with his complete loss of interest in waging nuclear warfare on parts of the Pacific Northwest.

Copyright © 2005, Ben Okopnik. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 114 of Linux Gazette, May 2005

<-- prev | next -->
Tux