BTRFS: Finding and fixing highly fragmented files
BTRFS fragmentation can hurt the performance of your System.
it is possible to use btrfs fi defrag on your whole filesystem, but that causes all you snapshots to duplicate the data. it also causes a lot of IO so this is nothing you want to do on your production server on a regular basis.
Today i discovered that there is a linux-tool called filefrag which reports how many fragments a file consists of.
So i thought „why not try to find the most fragmented files and fix just these?“ here you go:
find the most fragmented files on your System:
find / -xdev -type f| xargs filefrag 2>/dev/null | sed 's/^\(.*\): \([0-9]\+\) extent.*/\2 \1/' | awk -F ' ' '$1 > 500' | sort -n -r | head -30
you should review this list. if there is something with 10000+ extends, it is a candidate to be flagged as nodatacow. In my case, i have discovered that the fail2ban sqlite database was using 170k extends which is a lot!
if you think there is nothing to nodatacow, you can just go ahead and defrag all files on that list:
find / -xdev -type f| xargs filefrag 2>/dev/null | sed 's/^\(.*\): \([0-9]\+\) extent.*/\2 \1/' | awk -F ' ' '$1 > 500' | cut -f ' ' -f2 2>/dev/null | xargs -r btrfs fi defrag -f -v
short explanation of the command: find gets all files on the specified path (/) without descending into other mounted filesystems (-xdev). then filefrag determines the fragmantation, the sed command reformats the output so that the extend count is on first position followed by the filename. then awk parses that list filtering only files that have more than 500 extends. after that is done, the output is „cut“ to only contain the filenames and passed to btrfs defrag for defragmentation. -v on the defrag command prints out all processed files.
Also take a look on the longterm io usage before and after the defrag to see how big the difference in the real world is.