lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150610013902.GA176908@mail.thefacebook.com>
Date:	Tue, 9 Jun 2015 18:39:02 -0700
From:	Calvin Owens <calvinowens@...com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Alexey Dobriyan <adobriyan@...il.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Miklos Szeredi <miklos@...redi.hu>,
	Zefan Li <lizefan@...wei.com>, Oleg Nesterov <oleg@...hat.com>,
	Joe Perches <joe@...ches.com>,
	David Howells <dhowells@...hat.com>,
	<linux-kernel@...r.kernel.org>, <kernel-team@...com>,
	Andy Lutomirski <luto@...capital.net>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Kees Cook <keescook@...omium.org>,
	"Kirill A. Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH v6] procfs: Always expose /proc/<pid>/map_files/ and make
 it readable

On Tuesday 06/09 at 14:13 -0700, Andrew Morton wrote:
> On Mon, 8 Jun 2015 20:39:33 -0700 Calvin Owens <calvinowens@...com> wrote:
> 
> > Currently, /proc/<pid>/map_files/ is restricted to CAP_SYS_ADMIN, and
> > is only exposed if CONFIG_CHECKPOINT_RESTORE is set.
> > 
> > This interface very useful because it allows userspace to stat()
> > deleted files that are still mapped by some process, which enables a
> > much quicker and more accurate answer to the question "How much disk
> > space is being consumed by files that are deleted but still mapped?"
> > than is currently possible.
> 
> Why is that information useful?
> 
> I could perhaps think of some use for "How much disk space is being
> consumed by files that are deleted but still open", but to count the
> mmapped-then-unlinked files while excluding the opened-then-unlinked
> files seems damned peculiar.

Let's phrase the question a bit more generically:

"How much disk space is being consumed by files that have been
unlinked, but are still referenced by some process?"

There are two pieces to this problem:
	1) Unlinked files that are still open (whether mapped or not)
	2) Unlinked files that are not open, but are still mapped

You can track down everything in (1) using /proc/<pid>/fd/*, and you
can use stat() to figure out how much space they're using.

But directly measuring how much space (2) consumes is actually not
currently possible from userspace: there's no way to stat() the files.
You can get the inode number from /proc/<pid>/maps, but that still
doesn't get you anywhere because it's been unlinked from the
filesystem.

So I'm not looking to measure (2) and exclude (1): I'm looking to have
a way to directly measure (2) at all.

The reason I say "directly", and I say "quicker and more accurate" in
the original message, is that there is a very ugly way to answer this
question right now: you sum up the number of blocks used by every file
on the disk and subtract it from what statfs() tells you. This
obviously stinks, and becomes untenable once your filesystem is large
enough.
 
> IOW, this changelog failed to explain the value of the patch.  Bad
> changelog!  Please sell it to us.  Preferably with real-world use
> cases.

The real-world use case is catching long-lived processes that leak
references to temporary files and waste space on the disk. When such
processes leak file-backed mappings, this wasted space is especially
difficult to detect until it gets out of hand. The map_files/
interface eliminates this difficulty.

I've included a little test program at the end of this file to illustrate
what I'm getting at here. It creates a file at /tmp/DELETEDFILE:

	calvinowens@...dn:~$ gcc test.c 
	calvinowens@...dn:~$ ./a.out &
	[1] 5832
	Holding mapping at 0x7fe74d1ea000
	calvinowens@...dn:~$ lsof -p `pgrep a.out`
	COMMAND  PID        USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
	a.out   5832 calvinowens  cwd    DIR  254,1     4096 3413033 /home/calvinowens
	a.out   5832 calvinowens  rtd    DIR  254,1     4096       2 /
	a.out   5832 calvinowens  txt    REG  254,1     7512 3408268 /home/calvinowens/a.out
	a.out   5832 calvinowens  mem    REG  254,1  1729984 4456767 /lib/x86_64-linux-gnu/libc-2.19.so
	a.out   5832 calvinowens  mem    REG  254,1   140928 4456619 /lib/x86_64-linux-gnu/ld-2.19.so
	a.out   5832 calvinowens  mem    REG   0,32    32768  184946 /tmp/DELETEDFILE
	a.out   5832 calvinowens    0u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5832 calvinowens    1u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5832 calvinowens    2u   CHR  136,2      0t0       5 /dev/pts/2
	calvinowens@...dn:~$ killall a.out
	[1]+  Terminated              ./a.out
	calvinowens@...dn:~$ gcc -DDO_UNLINK test.c 
	calvinowens@...dn:~$ ./a.out &
	[1] 5842
	Holding mapping at 0x7fec8ae63000
	calvinowens@...dn:~$ lsof -p `pgrep a.out`
	COMMAND  PID        USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
	a.out   5842 calvinowens  cwd    DIR  254,1     4096 3413033 /home/calvinowens
	a.out   5842 calvinowens  rtd    DIR  254,1     4096       2 /
	a.out   5842 calvinowens  txt    REG  254,1     7640 3408268 /home/calvinowens/a.out
	a.out   5842 calvinowens  mem    REG  254,1  1729984 4456767 /lib/x86_64-linux-gnu/libc-2.19.so
	a.out   5842 calvinowens  mem    REG  254,1   140928 4456619 /lib/x86_64-linux-gnu/ld-2.19.so
	a.out   5842 calvinowens  DEL    REG   0,32           184946 /tmp/DELETEDFILE
	a.out   5842 calvinowens    0u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5842 calvinowens    1u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5842 calvinowens    2u   CHR  136,2      0t0       5 /dev/pts/2

Notice the gap under "SIZE/OFF" in the 2nd output? This is because lsof
has no possible way to actually determine the leaked file's size.
That's the functionality "hole" I'm trying to fill with this patch.

Does that all seem sensible?

Thanks,
Calvin

--
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>

int main(void)
{
	int ret, fd;
	void *map;

	fd = open("/tmp/DELETEDFILE", O_CREAT|O_TRUNC|O_RDWR, 0777);
	if (fd == -1)
		return -1;

	ret = ftruncate(fd, 32768);
	if (ret == -1)
		return -1;

	map = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
			fd, 0);
	if (map == MAP_FAILED)
		return -1;

	close(fd);
	#ifdef DO_UNLINK
	unlink("/tmp/DELETEDFILE");
	#endif

	printf("Holding mapping at %p\n", map);
	while (1)
		sleep(UINT_MAX);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ