...making Linux just a little more fun!
Many years ago, I was taking a LOT of pictures at work with an early technology digital camera. It provided images in JPEG format. I was in the process of modifying the images to correct problems with contrast when I discovered that the camera did a lousy job of compressing the files.
With the standard software provided by the Independent JPEG Group (which contains the marvelous jpegtran utility), I could often reduce file size by 50%. Losslessly. How does this work? A standard JPEG image uses encoding parameters from sub-sections of the image. An optimized JPEG image uses encoding parameters from the entire image. While the JPEG algorithm is already lossy, the jpegtran utility preserves the "lossiness" of the original JPEG image exactly. The new file is thus lossless with respect to the original file.
Here's the command I used:
# jpegtran -optimize < original.jpg > optimized.jpg
Unfortunately, I'd taken several hundred photos, and processing them one at a time by hand would have taken forever. So... I wrote one of my first major shell scripts ever. Once I had it working, I was able to optimize all of the images in less than ten minutes! The opt-jpg script was born.
A few weeks later, I discovered that "progressive" JPEG images were sometimes even smaller, but not always. Thus, I modified my script to try it both ways, and to keep whichever result was smaller. This made for a trickier script, but the results were worth it. The opt-jpg script was improved.
And even later, an unfortunate event with misnamed BMP images forced me to add error-checking, so that the script wouldn't modify non-JPEG files. The opt-jpg script became more robust.
Further time resulted in a GIF optimizing script (based on gifsicle) and a PNG optimization script (based on pngcrush) as well, called opt-gif and opt-png, respectively. These routines work by colormap reduction, filtering changes, and other such tricks. You'd be amazed at how many images out there have a enormous 256-entry colormaps and only use 3 or 4 of the entries. I recently packaged all of these scripts together and published them as the littleutils.
While my original motivation in writing these scripts was for dealing with lousy digital cameras, they are also well-suited for optimizing all of the graphics on a web site. Why optimize your graphics? To save hard drive space. To fit more images on your site. To reduce the amount of time it takes for visitors to your site to load your pages. The reasons are obvious.
So how does it work? We'll demonstrate with the web pages of the Linux Gazette itself. First, let's get all of the website files copied onto a local hard drive. The following command sequence (under bash) will accomplish this:
# wget --no-directories --timestamping --recursive --level=1 \ --accept=.gz http://linuxgazette.net/ftpfiles/ # for i in *.tar.gz ; do tar -xvzf $i ; done
And before we begin, we need to establish how much filespace our current images require:
# cd lg/ # find . -name "*.jpg" -print | tar -cf ../jpg.tar -T - # ls -l ../jpg.tar # find . -name "*.jpeg" -print | tar -cf ../jpeg.tar -T - # ls -l ../jpeg.tar # find . -name "*.JPG" -print | tar -cf ../JPG.tar -T - # ls -l ../JPG.tar jpg.tar + jpeg.tar + JPG.tar = 44288000 bytes total # find . -name "*.gif" -print | tar -cf ../gif.tar -T - # ls -l ../gif.tar gif.tar = 13066240 bytes total # find . -name "*.png" -print | tar -cf ../png.tar -T - # ls -l ../png.tar png.tar = 21596160 bytes total
Next, you'll need to download and install the littleutils. It's a pretty standard "./configure && make && make install" routine. Once that's done, we can optimize the images:
# find . -name "*.jp*g" -exec opt-jpg {} \; # find . -name "*.JP*G" -exec opt-jpg {} \; # find . -name "*.gif" -exec opt-gif {} \; # find . -name "*.png" -exec opt-png {} \;
After some lengthy period of time (PNG optimization is particularly slow), the images will be fully optimized. If the tar commands above are repeated, we get the following results (over a 6-megabyte improvement!):
jpg.tar + jpeg.tar + JPG.tar = 41185280 bytes total (a 7% savings!) gif.tar = 12759040 bytes total (a 2.5% savings) png.tar = 18452480 bytes total (a 15% savings!!)
Also, if you scroll through the results, you'll find that several files are misnamed. In particular, there were a lot of GIF images posing as PNG images. (Apparently a few people out there think that "mv image.gif image.png" is an easy way to convert image files. Not quite...) There were even a few Windows BMP images posing as PNG images. <blegh> A complete list of these files can be found here: badfile.txt. If these files are properly renamed and optimized (or better yet, properly converted and optimized), then further filespace savings can be achieved.
[ Thanks for spotting those for us, Brian; they're all fixed now. I take some small pleasure in noting that all the errors, with the exception of one class, were from before I took over running LG. The errors that came from me - mea culpa - resulted from the fact that "convert" fails to actually change the image type if the original file has a mismatched extension; that script is now also fixed, with the image types forced to the correct ones. -- Ben ]
While this example clearly shows that littleutils can be used to achieve considerable filespace savings, there are two major caveats:
[1] The image optimization in littleutils is aggressive, with all extraneous information being thrown away. This includes comments, ancillary tags, colorspace information, etc. You ought to run a few test cases before optimizing your entire Photoshop collection.
[2] The image optimization in littleutils does not preserve interlacing. GIF and PNG images will always have interlacing removed, and JPEG images may be converted to progressive or non-progressive (depending on which is smaller). If interlacing is particularly important to you, you'll need to skip optimization or modify the scripts to keep the interlacing as you want.
However, for most website purposes, the optimization scripts found in littleutils work quite well. Merry optimizing!!
For further website optimization, you might also consider using the repeats utility, also from littleutils. This nifty script will find duplicate files in any directory tree. If run in the Linux Gazette directory, the following duplicate files are found: repeats.txt. To reduce website filespace requirements even further, all but one of the duplicates could be deleted, and the HTML references to the deleted duplicates could be pointed to the remaining copy.
Brian Lindholm is a Virginia Tech graduate and middle-aged mechanical engineer who started programming in BASIC on a TRS-80 Model I (way back in 1980). In the late eighties, he moved to Pascal and C on an IBM PC-compatible.
Over the years, Brian became increasingly disgruntled with the instability and expense of the various Microsoft operating systems. In particular, hehated not being in full control of his system. MOST fortunately for him, however, he had a college roommate who ran Linux (way back in the Linux 0.9 and Slackware 1.0 days). That introduction was all he needed.
Over the years, he's slowly learned more and more, and now manages to keep his Debian system running happy and stable (even through two major upgrades: 2.2 to 3.0, and 3.0 to 3.1). [A point of note: His Debian system has NEVER crashed on its own. EVER. Only power failures, attempts to boot off the wrong partition, and a particularly flaky IDE Zip drive ever managed to take it down.] He loves Vim and has found Perl amazingly useful at work.
In his non-Linux life, Brian helps design power generation equipment (big power plant stuff) for a living, occasionally records live music for people, reads too much science fiction, and gets out on the Appalachian Trail as often as he can.