Unix and NTFS file systems, hardlinks, inodes, files, directories, dot directories, bugs and implementation details
Posted by jpluimers on 2021/09/21
Lots of interesting tidbits on unix and NTFS file systems.
If you want to blow up your tooling, try creating a recursive hardlink…, which is likely one of the reasons that nx file systems do not support them.
Covered and related topics:
inode
/vnode
(which are equivalent) and NTFSFile ID
:- [WayBadk] vnode/Virtual file system – Wikipedia
- [WayBack] inode – Wikipedia
- NTFS
File ID
works in almost the same way asvnode
/inode
:[WayBack] Does Windows have Inode Numbers like Linux? – Stack Overflow
in terms of having an integer that uniquely identifies a file, NTFS and some Windows API expose the concept of “file IDs” which is similar.
- [WayBack] Hard link – Wikipedia: any file system supports at least a hard link count of 1 to bind names to files and special files (like directories: they contain directory entries)
- [WayBack] filesystems – Why are hard links to directories not allowed in UNIX/Linux? – Unix & Linux Stack Exchange
- [WayBack] filesystem – Why are hard links not allowed for directories? – Ask Ubuntu
- [WayBack] powershell – Recursive hard link – Stack Overflow
- [WayBack] You can create an infinitely recursive directory tree | The Old New Thing
- [WayBack] How do I find the original name of a hard link? | The Old New Thing
- [WayBack] Symbolic link – Wikipedia
- [WayBack] NTFS – Wikipedia which is derived from [WayBack] Files-11 – Wikipedia, which has roots in [WayBack] TOPS-20 – Wikipedia (hello TENEX!) and [WayBack] RSTS/E – Wikipedia
The tweets (especially follow the train of thought in the various subtrees: a great way to learn new things!):
- [WayBack] 🔎Julia Evans🔍 on Twitter: “TIL that ‘..’ is a hard link to the parent directory’s inode (and maybe that hard links were _invented_ specifically to make the Unix filesystem hierarchical??)”
- [WayBack] Alex Lipov on Twitter: “What happens if parent directory is part of another filesystem? Hard links are obviously supported only within a same fs..… “
- [WayBack] 🔎Julia Evans🔍 on Twitter: “ooh I’m not sure! this random stack overflow answer suggests that maybe `..` between filesystems works because directory inodes are “emulated” by the VFS and so the directory inodes aren’t the actual inodes in the real fs https://t.co/4BRl4C7ICP… “
- [WayBack] RhodiumToad on Twitter: “@alex_lipov @b0rk It’s up to the lookup function (traditionally called namei()) to detect two special cases for ‘..’: the current process root dir, and mountpoints”
- [WayBack] Rich Teer on Twitter: “@b0rk Most regular files use hard links (it’s just that most of the time they have a reference count of only 1).”
- [WayBack] ᔕᗩᗰ ᑕ 2020 ᐯIᔕIOᑎ on Twitter: “@b0rk I can’t find a mirror/archive but wanting to hide .. is why dot files (eg .config) are hidden – and it’s a bug!”
- [WayBack] Cyberhausmeister on Twitter: “@b0rk Isn’t ‘.’ also a hard link? The link counter of an empty directory is 2”
- [WayBack] Random832 on Twitter: “link count is normally 2 + number of subdirectories, but not all filesystems do it, and I don’t think it’s part of the formal spec. HFS on macOS sets it to 1 [which a classical model fs never returns for a directory] because there’s no efficient way to calculate that value.… “
- [WayBack] Ferran on Twitter: “What does it imply? What would happen if it were a soft link instead?… “
- [WayBack] 🔎Julia Evans🔍 on Twitter: “thinking through it now: if it were a soft link it would have to be absolute (like .. => /home/bork) and so the soft link wouldn’t work if you moved the directory around… “
- [WayBack] Chris Gerhard 🇪🇺 on Twitter: “symbolic links were a BSD invention. They did not exist in early UNIX. This is also why creating a hardlink to a directory is a bad idea even on filesystems that support it.… “
- [WayBack] 🔎Julia Evans🔍 on Twitter: “why is creating a hard link to a directory a bad idea?… “
- [WayBack] Chris Gerhard 🇪🇺 on Twitter: “Consider the dir /a/b/c if I now create a new link /d that points to /a/b/c what does “..” mean?… “
- [WayBack] 🔎Julia Evans🔍 on Twitter: “ohh I get it yeah… “
- [WayBack] __red__ on Twitter: “The confusion is that we’re talking at different levels of abstraction. .. being a hardlink to the parent directory is an implementation detail in the filesystem code. That is very different to what happens when you type “cd ..”… “
- [WayBack] Chris Gerhard 🇪🇺 on Twitter: “Indeed. It would be possible to make this work, but at the cost of complexity, for very limited benefit. If you want the same effect then you can use a loop back file system…”
- [WayBack] __red__ on Twitter: “I don’t understand what you said here. We both agree that cd .. isn’t implemented by using the .. inode to identify where to go – right? (That was the root of my tweet – the values can be inconsistent because cd doesn’t give two hoots about the values of the inodes wrt ..)… “
- [WayBack] Chris Gerhard 🇪🇺 on Twitter: “Historically the cwd was stored in the process as a pointer to a vnode and vnodes had no reference to their path, so .. is the vnode’s parent and vnodes can have only one parent. Changing that introduces complexity for little gain.… “
- [WayBack] __red__ on Twitter: “The confusion is that we’re talking at different levels of abstraction. .. being a hardlink to the parent directory is an implementation detail in the filesystem code. That is very different to what happens when you type “cd ..”… “
- [WayBack] 🔎Julia Evans🔍 on Twitter: “ohh I get it yeah… “
- [WayBack] John Kemp on Twitter: “There are a lot of tools that will get into a bit of an infinite recursion while trying to scan a directory if you create a loop using hardlinks, especially on Windows where they’re not seen as often… “
- [WayBack] Chris Gerhard 🇪🇺 on Twitter: “Consider the dir /a/b/c if I now create a new link /d that points to /a/b/c what does “..” mean?… “
- [WayBack] 🔎Julia Evans🔍 on Twitter: “why is creating a hard link to a directory a bad idea?… “
- [WayBack] Alex Lipov on Twitter: “What happens if parent directory is part of another filesystem? Hard links are obviously supported only within a same fs..… “
It is important to understand that the concept File IDs and inode/vnode has far reaching consequences, for instance from [WayBack] inode – Wikipedia
- Files can have multiple names. If multiple names hard link to the same inode then the names are equivalent; i.e., the first to be created has no special status. This is unlike symbolic links, which depend on the original name, not the inode (number).
- An inode may have no links. An unlinked file is removed from disk, and its resources are freed for reallocation but deletion must wait until all processes that have opened it finish accessing it. This includes executable files which are implicitly held open by the processes executing them.
- It is typically not possible to map from an open file to the filename that was used to open it. The operating system immediately converts the filename to an inode number then discards the filename. This means that the getcwd() and getwd() library functions search the parent directory to find a file with an inode matching the working directory, then search that directory’s parent, and so on until reaching the root directory. SVR4 and Linux systems maintain extra information to make this possible.
- Historically, it was possible to hard link directories. This made the directory structure into an arbitrary directed graph contrary to a directed acyclic graph. It was even possible for a directory to be its own parent. Modern systems generally prohibit this confusing state, except that the parent of root is still defined as root. The most notable exception to this prohibition is found in Mac OS X (versions 10.5 and higher) which allows hard links of directories to be created by the superuser.[10]
- A file’s inode number stays the same when it is moved to another directory on the same device, or when the disk is defragmented which may change its physical location. This also implies that completely conforming inode behavior is impossible to implement with many non-Unix file systems, such as FAT and its descendants, which don’t have a way of storing this invariance when both a file’s directory entry and its data are moved around.
- Installation of new libraries is simple with inode file systems. A running process can access a library file while another process replaces that file, creating a new inode, and an all-new mapping will exist for the new file so that subsequent attempts to access the library get the new version. This facility eliminates the need to reboot to replace currently mapped libraries.
- It is possible for a device to run out of inodes. When this happens, new files cannot be created on the device, even though there may be free space available. This is most common for use cases like mail servers which contain many small files. File systems (such as JFS or XFS) escape this limitation with extents or dynamic inode allocation, which can “grow” the file system or increase the number of inodes.
A very cool read in the midst of the tweet tree was this reference to former Google Plus by [WayBack] Rob Pike – Wikipedia (of Golang, Unix team and Plan 9 fame).
PublicA lesson in shortcuts.
Long ago, as the design of the Unix file system was being worked out, the entries . and .. appeared, to make navigation easier. I’m not sure but I believe .. went in during the Version 2 rewrite, when the file system became hierarchical (it had a very different structure early on). When one typed ls, however, these files appeared, so either Ken or Dennis added a simple test to the program. It was in assembler then, but the code in question was equivalent to something like this:
if (name[0] == ‘.’) continue;
This statement was a little shorter than what it should have been, which is
if (strcmp(name, “.”) == 0 || strcmp(name, “..”) == 0) continue;
but hey, it was easy.Two things resulted.
First, a bad precedent was set. A lot of other lazy programmers introduced bugs by making the same simplification. Actual files beginning with periods are often skipped when they should be counted.
Second, and much worse, the idea of a “hidden” or “dot” file was created. As a consequence, more lazy programmers started dropping files into everyone’s home directory. I don’t have all that much stuff installed on the machine I’m using to type this, but my home directory has about a hundred dot files and I don’t even know what most of them are or whether they’re still needed. Every file name evaluation that goes through my home directory is slowed down by this accumulated sludge.
I’m pretty sure the concept of a hidden file was an unintended consequence. It was certainly a mistake.
How many bugs and wasted CPU cycles and instances of human frustration (not to mention bad design) have resulted from that one small shortcut about 40 years ago?
Keep that in mind next time you want to cut a corner in your code.
(For those who object that dot files serve a purpose, I don’t dispute that but counter that it’s the files that serve the purpose, not the convention for their names. They could just as easily be in $HOME/cfg or $HOME/lib, which is what we did in Plan 9, which had no dot files. Lessons can be learned.)
–jeroen
Leave a Reply