linux-kernel - Re: [PATCH] proc: readdir race fix (take 3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200609061101.11544.jdelvare@suse.de>
Date:	Wed, 6 Sep 2006 11:01:11 +0200
From:	Jean Delvare <jdelvare@...e.de>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Andrew Morton <akpm@...l.org>, Oleg Nesterov <oleg@...sign.ru>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, ak@...e.de,
	Albert Cahalan <acahalan@...il.com>, Paul Jackson <pj@....com>
Subject: Re: [PATCH] proc: readdir race fix (take 3)

On Tuesday 5 September 2006 16:52, Eric W. Biederman wrote:
> The problem: An opendir, readdir, closedir sequence can fail to report
> process ids that are continually in use throughout the sequence of
> system calls.  For this race to trigger the process that
> proc_pid_readdir stops at must exit before readdir is called again.
>
> This can cause ps to fail to report processes, and  it is in violation
> of posix guarantees and normal application expectations with respect
> to readdir.
>
> Currently there is no way to work around this problem in user space
> short of providing a gargantuan buffer to user space so the directory
> read all happens in on system call.
>
> This patch implements the normal directory semantics for proc,
> that guarantee that a directory entry that is neither created nor
> destroyed while reading the directory entry will be returned.  For
> directory that are either created or destroyed during the readdir you
> may or may not see them.  Furthermore you may seek to a directory
> offset you have previously seen.
>
> These are the guarantee that ext[23] provides and that posix requires,
> and more importantly that user space expects. Plus it is a simple
> semantic to implement reliable service.  It is just a matter of
> calling readdir a second time if you are wondering if something new
> has show up.
>
> These better semantics are implemented by scanning through the
> pids in numerical order and by making the file offset a pid
> plus a fixed offset.
>
> The pid scan happens on the pid bitmap, which when you look at it is
> remarkably efficient for a brute force algorithm.  Given that a typical
> cache line is 64 bytes and thus covers space for 64*8 == 200 pids. 
> There are only 40 cache lines for the entire 32K pid space.  A typical
> system will have 100 pids or more so this is actually fewer cache lines
> we have to look at to scan a linked list, and the worst case of having
> to scan the entire pid bitmap is pretty reasonable.
>
> If we need something more efficient we can go to a more efficient data
> structure for indexing the pids, but for now what we have should be
> sufficient.
>
> In addition this takes no additional locks and is actually less
> code than what we are doing now.
>
> Also another very subtle bug in this area has been fixed.  It is
> possible to catch a task in the middle of de_thread where a thread is
> assuming the thread of it's thread group leader.  This patch carefully
> handles that case so if we hit it we don't fail to return the pid, that
> is undergoing the de_thread dance.
>
> This patch is against 2.6.18-rc6 and it should be relatively straight
> forward to backport to older kernels as well.
>
> Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> for
> providing the first fix, pointing this out and working on it.

Eric, Kame, thanks a lot for working on this. I'll be giving some good 
testing to this patch today, and will return back to you when I'm done.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/