How to find growing files on Linux

This may become handy, if you are logged in to a machine, which you do not know much about, but where you can see that the disk space is running out. I think this scenario is pretty common - i have seen it simply with logs, but also with applications bugging and producing tons of data.

Specific files

Create a temporary file, and use the find command to find newer files than that file.

Create an empty file:

 $ touch newer_than_this_file    

Look for files on the whole machine ("/"), which are newer ("-newer") than the file you just created ("newer_than_this_file"). Do not look for files in /proc/ ("-not -path "/proc/*""). Run the ls -lh command on the files found:

$ find / -newer newer_than_this_file -not -path "/proc/*" -exec ls -lh {} \;

We can also specify a certain size if we are looking for a minimum size (1M or greater in this example):

$ find / -newer newer_than_this_file -size +1M -not -path "/proc/*" -exec ls -lh {} \;

Open files

Use the lsof (list open files) command to figure out what files your machine are using right now:

$ lsof

To make things easier i have made this line which does a few things:

$ lsof / > lsof_1.txt; sleep 15; lsof / > lsof_2.txt; sdiff -w250 lsof_1.txt lsof_2.txt > lsof_difference.txt; cat lsof_difference.txt | egrep '\||<|>'
  • It calls lsof and outputs the list of open files to file called lsof_1.txt. When you call lsof with / it only select physical files.
  • Then it sleeps (waits) for 15 seconds
  • Then it calls lsof again and outputs the list of open files to a new file called lsof_2.txt. When you call lsof with / it only select physical files.
  • Then we call sdiff to make a side-by-side difference of those two files, and redirect stdout to a file called lsof_difference.txt.
  • At last we egrep (regular expression grep) for the symbols "|", "<", or ">".

Then we get all differences printed out between those 5 seconds.

sdiff symbols explained shortly:

< - means that the line only exists in file 1
> - means that the line only exists in file 2
| - means that the lines from file 1 and file 2 are different

What processes

iotop (top-like diagnostics for io) is great to see what processes/services/applications is writing the most files. Just use it like the following:

$ iotop

Go for a specific folder

Here the du command is very handy. Go for, eg. the logs folder, and check what each directory's size is:

$ du -h /var/log/
78M    /var/log/apache2
24K    /var/log/redis
81M    /var/log/munin
28K    /var/log/mongodb
4.0K    /var/log/iptraf
4.0K    /var/log/unattended-upgrades
12K    /var/log/fsck
8.0K    /var/log/dbconfig-common
4.0K    /var/log/mysql
4.0K    /var/log/varnish
108K    /var/log/proftpd
4.0K    /var/log/atop
12K    /var/log/ajenti
4.0K    /var/log/samba
4.0K    /var/log/sysstat
176K    /var/log/apt
4.0K    /var/log/puppet
4.0K    /var/log/news
252K    /var/log/nginx
233M    /var/log/

Then you can simply use ls, on the directory you find evil (-lathr is good flags for listing. long listing format, hidden files, sort by modification time, human-readable output, recursive list):

$ ls -lathr 

Good luck hunting them down.