USB thumb drive RAID
I have a database that I'm working on, and sometimes I need to work on it on my laptop. However, the database is really demanding, and it is just too slow on my laptop's hard disk. I quickly found out that the limitation was the speed of the hard drive, and not so much the CPU. What I needed was a fast external hard drive. Anyway, I always wanted to play with a RAID system.
Hard Disk Performance
There are three parameters of drive speed:
- Read speed
- Write speed
- Access time - this is the time needed by the drive to find the place it should read the information from (or write to) before it starts reading or writing. For example, in your average hard drive, the head has to move to the physical location on the magnetic disk.
I do not need fast read/write speed, as the amount of information that I retrieve from the database is tiny and the db is almost entirely read-only. However, I do need fast access time: the database is huge, and I need to retrieve information from different positions in the database very quickly. That is, I need very low access times, acceptable reading speed, and I do not care about writing.
Solution
It is well known that the so-called "solid-state disks" (SSD) have very low access times. I could have tried to buy an SSD, but being a tinkerer, I decided for another option. Thumb drives / flash drives / pen drives are also a kind of SSDs, one could say - but they have lousy transfer rates. In the end, I decided to create a software RAID using four 2GB USB drives. I bought
- 4 USB drives, 2GB each
- 1 USB hub
Setting up the Software RAID
Prerequisites: you need the mdadm tool (in Debian, simply run apt-get install mdadm).
Insert the drives into the hub, and attach the hub to the computer. Note: if GNOME or whatever mounts the disks automatically, unmount them before continuing. First, it is necessary to find out the names of the devices that were attached:
dmesg | grep "Attached SCSI" sd 56:0:0:0: [sde] Attached SCSI removable disk sd 57:0:0:0: [sdf] Attached SCSI removable disk sd 58:0:0:0: [sdg] Attached SCSI removable disk sd 59:0:0:0: [sdh] Attached SCSI removable disk
OK, the devices are /dev/sde, /dev/sdf, /dev/sdg/, /dev/sdh. I want a RAID-0; that is, no redundancy, and 4x2GB=8GB of space. Creating the RAID is simple:
mdadm --create --verbose /dev/md0 --level=0 --raid-devices=4 /dev/sd{e,f,g,h}
This way, we have a new block device that can be formatted. I use ext2, since reliability / journaling plays no role:
mkfs.ext2 /dev/md0 tune2fs -c 0 -j 0 /dev/md0 mount /dev/md0 /mnt
The first command creates the filesystem ("formats" the device); the second disables regular checks. Finally, the third command mounts the RAID on the filesystem so we can write data to it and read from it.
Stopping and Starting the Array
Stopping the Array
Before you stop the array, run the following (and save the output somewhere):
mdadm --detail /dev/md0
To stop the array that is running, first unmount the directory (umount /mnt) and then stop the array:
mdadm --stop /dev/md0
Now, you can safely remove the disks and, for example, plug them into another machine.
Starting the Array, Again
Before you can use your RAID again, you need to "assemble" it. This is easy if you have not removed the disk and try the assembly on the same machine. In that case, you can just type:
mdadm --verbose -A /dev/md0 /dev/sd{e,f,g,h}
However, what if the device letters have changed (e.g. not e-h, but i,j.k,l)? Well, you could find out again what the letters are. But there is a better solution. Remember I told you to save the output from "mdadm --detail"? It contained a line like that:
UUID : d7ea744f:c3963d02:982f0012:7010779c
Based on this UUID, we can easily "assemble the array" on just any computer :
mdadm --verbose -A /dev/md0 -u d7ea744f:c3963d02:982f0012:7010779c
You can also enter this information in the config file /etc/mdadm/mdadm.conf
Performance Tests
Test | Description | Results | Comment |
---|---|---|---|
hdparm | reading | 52 MB/s | This is twice as good as my laptop, and worse than the 70MB/s of my SATA disk in my workstation |
dd | writing | 28 MB/s | Half of what my workstation disk can do |
seeker | random access | 0.8-1ms | This is 10-20 times better than an ordinary hard disk |
Notes for the Tests
- hdparm: this is a standard Linux utility (in Debian, install with
apt-get install hdparm
). The command line is as follows:hdparm -t /dev/md0
- dd: full command line:
dd if=/dev/zero of=/tmp/test2.bin bs=1M count=1024 conv=fsync
- seeker: I have taken this utility from thispage on disk performance tests. It makes purely random seeks on the device, thus simulating the worst-case scenario when small chunks of data need to be read from all over the place. Command line:
seeker /dev/md0
- furthermore, I have tested the performance of the whole setup with my specific app. It was great! Not as good as the RAM disk that I use for a part of it, but still very, very fast.
- ever since I have set it up a few days ago, I've been using it constantly with a heavy read load and have noticed no problems or errors (like the ones reported here).
- Note that the connection speed will be limited by the maximum connection speed of the USB, since 480Mbps (USB 2.0) gives you roughly 60MB/s at most.
Alternatives and Outlook
I have explained here how to create a RAID-0 from four USB thumb drives. However, most of what I was explaining here applies also to other RAID types and other disk drives. Even more so! You can combine just about any devices into a RAID. Well, it only makes sense if the devices have similar sizes, but (i) you can create a RAID out of RAIDs (e.g., join two 2GB USB sticks into a RAID0 /dev/md0, then join /dev/md0 with a 4GB USB stick to get a RAID0 of the size of 8GB...) and (ii) you can combine devices of different sizes using LVM (the logical volume manager).
Problems
Apart from some mistakes I made because I did not know 'mdadm', there were no problems. If you run into any, generally two things are of an immense help:
- reading the documentation :-) specifically "man mdadm" and the links below, and
- studying the kernel messages. This is best done with
tail -f /var/log/messages
Links
- LinuxInsight features this excellent article on testing the performance of the hard disks.
- There has been another attempt to create such an array, but with quite different conclusions and fewer explanations.
- To learn more about Linux software RAID, please read the linux RAID HOWTO and this page in the Ubuntu Wiki.
- Another good HOWTO, with additional, useful tips.
- Yet another experiment with USB thumb drives.
Keywords: usb flash stick thumb drive pendrive linux raid raid0 mdadm
Talkback: Discuss this article with The Answer Gang
January Weiner is a biologist who uses computational tools to investigate evolutionary processes. He is a postdoc in a bioinformatics group.