linux-kernel - Re: rwsem: down_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4BE3503A.2000309@google.com>
Date:	Thu, 06 May 2010 16:26:50 -0700
From:	Mike Waychison <mikew@...gle.com>
To:	Michel Lespinasse <walken@...gle.com>
CC:	David Howells <dhowells@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-MM <linux-mm@...ck.org>, Ying Han <yinghan@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: rwsem: down_read_unfair() proposal

Michel Lespinasse wrote:
> On Wed, May 05, 2010 at 11:03:40AM +0100, David Howells wrote:
>> If the system is as heavily loaded as you say, how do you prevent
>> writer starvation?  Or do things just grind along until sufficient
>> threads are queued waiting for a write lock?
> 
> Reader/Writer fairness is not disabled in the general case - it only is
> for a few specific readers such as /proc/<pid>/maps. In particular, the
> do_page_fault path, which holds a read lock on mmap_sem for potentially long
> (~disk latency) periods of times, still uses a fair down_read() call.
> In comparison, the /proc/<pid>/maps path which we made unfair does not
> normally hold the mmap_sem for very long (it does not end up hitting disk);
> so it's been working out well for us in practice.
> 

FWIW, these sorts of block-ups are usually really pronounce on machines 
with harddrives that take _forever_ to respond to SMART commands (which 
are done via PIO, and which can serialize many drives when they are 
hidden behind a port multiplier).  We've seen cases where hard faults 
can take unusually long on an otherwise non-busy machines (~10 seconds?).

The other case we have problems with mmap_sem from a cluster monitoring 
perspective occurs when we get blocked up behind a task that is having 
problems dying from oom.  We have a variety of hacks used internally to 
cover these cases, though I think we (David and I?) figured that it'd 
make more sense to fix the dependencies on down_read(&current->mmap_sem) 
in the do_exit() path.  For instance, it really makes no sense to 
coredump when we are being oom killed (and thus we should be able to 
skip the mmap_sem dependency there..).

Mike Waychison
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/