Why `df` and `du` Don't Match
Linux users are often surprised that the two command line tools to report disk utilisation give different results for the amount of space used. However, there are very good reasons for this disparity because the two tools work in completely different ways. The results should never be expected to match!
df reads statistics about available blocks from the filesystem driver through the statfs() system call. It subtracts the number of available blocks from the total to deduce the space used.
du reads the number of blocks used to store each file and directory using the stat() system call. It can sum these for every entry in the filesystem to deduce the space used.
Effectively df asks about the filesystem as a whole, and du asks about each entry individually.
The reasons for the disparity are thus:
Any space used but which doesn't produce a directory entry can't be asked about by du:
Filesystems always require space to organise data within themselves. In ext2/3, this is in the form of tables allocated when the filesystem is first formatted. In ReiserFS, this is a tree, which grows as more files are written to the system.
- Journals, used to make filesystems more robust, are not (usually) visible as filesystem entries.
- Filesystems can contain file data that does not (currently) correspond to a directory entry. Linux won't free up space used by a file while the file is open, even though it may delete the file. Some filesystem implementations won't create a directory entry for a newly written file until the file is closed.
Some drivers can't represent the complexity of their internal state through stat/statfs data structures. ReiserFS, for example, does tail packing which uses one block for more than one file. ReiserFS reports the number of blocks a file spans, which means du will overestimate.
Some drivers simply don't know how storage is being done. Network file systems, for example, usually know about the space free and total size, but may not be told how much is used for each file. They may provide an uninformed guess instead. du will rely on these guesses.
- One or more files may have been deleted, but process(es) still have them open. They still occupy space on the filesystem, but do not appear in a directory listing: they will be included in the df total, but not the du total.
- Another filesystem is mounted on a directory within the filesystem. This can cause du give misleading values in two cases:
- You have not used the -x flag to du and therefore have included file sizes from the other filesystem.
- There are files which have been hidden by mounting the other filesystem on top of them, which are not including in the du total.
It should also be noted that the space df reports as free may not be available to store files. On ReiserFS, filesystem internals grow as the numbers of files stored grows. On Ext2/3, some number of blocks are reserved for root processes to use.