From Kathy
Answered By: Jason Creigton, Faber Fedor, Neil Youngman, Jim Dennis, Jay R. Ashworth, Ben Okopnik, Thomas Adam
I'm confused, if Linux doesn't allow directory hard links then why does every Linux directory have at least two hard links?
Thanks, Kathy
[Jason] Not exactly sure....but http://linuxgazette.net/issue35/tag/links.html makes for good reading.
[Faber] You know, I'm confused too! Looking into it a little bit, it seems that whether or not directory hard links are allowed depends on the underlying file system. Fire systems that are of type vxfs (no, I don't know what that means either don't allow the creation of directory hard links. I've yet to discover why.
The reason we have them in Linux (. and ..) is, I always assumed, so we have a way to travel up the directory tree (cd ..), otherwise the system would need to know the name of the parent directory (as opposed to just its inode).
Why is . a hard link to the current directory then? <shrug> Because un*x people are lazy typists?
A very interesting question, BTW. I'm interested in finding the answer to it myself.
[Jim]
The system uses hard links to manage the link from the parent to the directory's inode, the . link in that directory and all of the .. entries in all of its subdirectories.
USER'S (including 'root') are forbidden to create additional hard links because this would make fsck much more difficult to implement and it might allow one to create hard link loops, and dangling sub trees.
Basically the directory linkages are maintained by the filesystem to glue the whole tree together, to ensure that it is truly an acyclic tree structure with a single root.
In other words it's a policy enforced by the kernel. Some other UNIX systems have allowed root to create hardlinked directories; and it could be done with a disk editor like LDE under Linux (though I'd expect fsck to complain the next time it was run --- and if you did something degenerate you might cause some interesting problems --- possibly even cause the kernel to treat the fs as corrupt and invoke it's handler (remount,ro, panic, or continue) and possibly even cause some kernel threads to run amok or panic the system.
[Neil]
Traditionally, in Unix systems a file or directory is physically deleted from the disk when there are no hard links to it. The rm and rmdir commands command remove a directory entry (link). If there are more than one hard links to a file or directory, the file remains, so although we regard the rm command as deleting a file, it only deletes the link to the file. When there are no hard links to a file or directory, the file system will then free up the actual space used by the file. There have to be hard links to directories or they would be deleted by the filesystem.
ISTR that hard links to directories can only be created by mkdir to ensure that we can't build up cyclic directory structures, otherwise programs such as find, which traverse the directory could loop over the same directory structure for ever.
In conclusion, Linux does allow hard links to directories, but it only allows hard links to a directory from itself and it's parent directory. These are the two hard links to which you refer.
[Neil] Some ambiguity there. If there are more then one links before the rm command, there will be at least one after the rm command, so the file space isn't freed. Of course rmdir deletes both links to a directory.
[Jason]
Okay, I've looked into this more: It appears that, for some reason or another (Another Gang member will no doubt know why) it's a Bad Idea to make hard links with directories. Look here:
root:~# ln lala foo ln: `lala': hard link not allowed for directory root:~# strace ln lala foo execve("/bin/ln", ["ln", "lala", "foo"], [/* 17 vars */]) = 0 uname({sys="Linux", node="jpc.example.com", ...}) = 0 brk(0) = 0x804db0c open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=19148, ...}) = 0 mmap2(NULL, 19148, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40014000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\\\1"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=1494904, ...}) = 0 mmap2(NULL, 1256324, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40019000 mprotect(0x40144000, 31620, PROT_NONE) = 0 mmap2(0x40144000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12a) = 0x40144000 mmap2(0x4014a000, 7044, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4014a000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4014c000 munmap(0x40014000, 19148) = 0 stat64("foo", 0xbffffc10) = -1 ENOENT (No such file or directory) lstat64("lala", {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 write(2, "ln: ", 4ln: ) = 4 write(2, "`lala\': hard link not allowed fo"..., 43`lala': hard link not allowed for directory) = 43 write(2, "\n", 1 ) = 1 exit_group(1) = ?
Notice, that, in the strace output, link() doesn't actually get called. So is this 'ln' just trying to save us from outselves, or is the kernel or glibc? I wrote this quick C program:
See attached creighton.c-link.c.txt
root:~# link=/home/jason/prog/c/link root:~# $link lala foo Error while linking: Operation not permitted root:~# strace $link lala foo <sniped syscall trace that looks very similar to ln's strace output....the important line is: link("lala", "foo") = -1 EPERM (Operation not permitted) <sniped more> root:~#
So, 'ln' sees that you're trying to hardlink directories and doesn't even attempt it, instead giving a useful error message. And since we see the link() syscall being proformed, it means that the kernel doesn't allow hard linking of directories, and it's not the glibc wrapper that refuses to hardlink directories. (If it had been glibc, we would not have even seen link() being called: the link() in glibc would have returned without calling the link() syscall.)
Now, back to your original question: I have no idea why creating hard links to directories is a bad idea. (It must be, or Linux would allow root to do it.) LG #35 Answer Guy column has something about this: (Quoting from the article I linked to in my other email)
<quote>
Some versions of Unix have historically allowed root (superuser) to create hard links to directories --- but the GNU utilities under Linux won't allow it --- so you'd have to write your own code or you'd have to directly modify the fs with a hex editor
<end quote>
Well, obviously, it's the kernel disallowing it, not GNU utilites. However, LG
#35 was some time ago, so things might have been different then.
[jra]
User-added hardlinks to directories are forbidden because they break the directed acyclic graph structure of the filesystem (which is an ASSERT in Unixiana, roughly), and because they confuse the hell out of file-tree-walkers (a term Multicians will recognize at sight, but Unix geeks can probably figure out without problems too.
(Did I get that right, Ben?
And indeed, anyone who's ever done
# rm -rf .*
in a user's home directory to clear out all the dotfiles prior to dropping the user will no doubt understand why even the system 3 links to a directory (. in ., .. in children, and the named link in the parent) are often 2 too many.
[Jason]
Ouch! Never thought about that, I'll have to remember that....
[Jason]
Yes, I wrote that before I got to read the rest of the thread. With symlinks, at least it's easy to tell when there's a loop. (BTW, I seem to recall an option in Wine to ignore symlinks because they causes some Windows programs to get very, very confused.)
~/tmp$ ln -s file1 file2 ~/tmp$ ln -s file2 file1 ~/tmp$ ls -l file* lrwxrwxrwx 1 jason users 5 Jul 20 16:59 file1 -> file2 lrwxrwxrwx 1 jason users 5 Jul 20 16:59 file2 -> file1 ~/tmp$ cat file1 cat: file1: Too many levels of symbolic links ~/tmp$ strace -e trace=open cat file1 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 open("/lib/libc.so.6", O_RDONLY) = 3 open("file1", O_RDONLY|O_LARGEFILE) = -1 ELOOP (Too many levels of symbolic links) cat: file1: Too many levels of symbolic links
[jra] User-added hardlinks to directories are forbidden because they break the directed acyclic graph structure of the filesystem (which is an ASSERT in Unixiana, roughly), and because they confuse the hell out of file-tree-walkers (a term Multicians will recognize at sight, but Unix geeks can probably figure out without problems too.
[Jason] I just thought of something else:
Disk space management and memory management are the same thing.
UNIX has chosen reference counting for disk space management. Reference counting can't deal with cyclic (Right word? I mean data structures that refer to themselves.) data structures, and thus hardlinking directories is disallowed. If Linux used garbage collection, it would be okay to hardlink directories, if very confusing.
But using GC on filesystems would be slow and offer no real advantages, so reference counting is okay.
Well, root must be able to create hard links, because of the option ln --directory (-d, -F).
[Jason] Try it:
root:~# mkdir dir1 root:~# ln -d dir1 dir2 ln: creating hard link `dir2' to `dir1': Operation not permitted
[Thomas] Then in the same thead....
[jra] And indeed, anyone who's ever done:
# rm -rf .*
[Jim]
The GNU version of 'rm' will refuse to attempt recursion into or unlinking of . and/or .. entries.
However this is still a classic sysadmin tech question. It's almost as common as: "How do I remove a file named -fr?"
[Ben]
rm -- -fr rm ./-fr perl -we'unlink "-fr"' "F8" in Midnight Commander
[Thomas]
You forgot to mention using Emacs.... You also didn't mention jettisoning the disk into space...
[Jim]
How do you SAFELY remove all the dot files and dot directories under the current directory?
My best answer under Bash is:
rm -fr .??* .[^.]
[Ben]
rm -rf .[^.]*
is what I've always used; this would, of course, ignore files named "..." and so on, but that's not much of an issue in practice.
[Jim]
... this gets anything starting with a dot and followed by at least two more characters (thus . and .. won't match) and also it gets anything starting with a dot and followed by any single character other than a dot (thus getting such unlikely entries as .a .\? etc). This pair of glob patterns should match every possible dot entry except . and ..
However, I preface it with under bash. I happen to know it will work under ash, zsh, tcsh, and most other modern shells. However, if I was on a particularly old shell I'd have to check. The negated character class might not have been in the glob libraries of earliest Bourne and C shells.
If I really had to write a script for maximum portability:
rm -fr .??*; rm -fr `echo .? | grep -v '\.\.' `
... remove all the longer dot entries in the obvious way, then let echo match all the two character dot entries and grep out the .. entry explicitly.
[Ben] Other interesting situations come up when you want to delete a file named in a foreign language. I've run into a Russian song name that even Midnight Commander couldn't handle. Cutting and pasting to "rm" didn't help either (clearly, some of the characters were the "escaped" types, but I had no idea which ones - long story short, the machine didn't have any Russian fonts on it.) Even "ls -b" failed for the above, for whatever reason. I ended up doing something like
perl -wle'/match/&&print for <*>'
where "match" was a small substring of the characters in the name. Needless to say, "print" became "unlink" when I saw that only the appropriate file matched.
Meet the Gang 1 2 3 4 5 6 |