linux-kernel - Re: [RFC][PATCH 6/8] mm: handle_speculative

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 7 Jan 2010 09:49:45 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	Christoph Lameter <cl@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"minchan.kim@...il.com" <minchan.kim@...il.com>,
	"hugh.dickins" <hugh.dickins@...cali.co.uk>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()

On Thu, 7 Jan 2010, Linus Torvalds wrote:
> 
> Well, I have yet to hear a realistic scenario of _how_ to do it all 
> speculatively in the first place, at least not without horribly subtle 
> complexity issues. So I'd really rather see how far we can possibly get by 
> just improving mmap_sem.

For an example of this: it's entirely possible that one avenue of mmap_sem 
improvement would be to look at the _writer_ side, and see how that can be 
improved. 

An example of where we've done that is in madvise(): we used to always 
take it for writing (because _some_ madvise versions needed the exclusive 
access). And suddenly some operations got way more scalable, and work in 
the presense of concurrent page faults.

And quite frankly, I'd _much_ rather look at that kind of simple and 
logically fairly straightforward solutions, instead of doing the whole 
speculative page fault work.

For example: there's no real reason why we take mmap_sem for writing when 
extending an existing vma. And while 'brk()' is a very oldfashioned way of 
doing memory management, it's still quite common. So rather than looking 
at subtle lockless algorithms, why not look at doing the common cases of 
an extending brk? Make that one take the mmap_sem for _reading_, and then 
do the extending of the brk area with a simple cmpxchg or something?

And "extending brk" is actually a lot more common than shrinking it, and 
is common for exactly the kind of workloads that are often nasty right now 
(threaded allocators with lots and lots of smallish allocations)

The thing is, I can pretty much _guarantee_ that the speculative page 
fault is going to end up doing a lot of nasty stuff that still needs 
almost-global locking, and it's likely to be more complicated and slower 
for the single-threaded case (you end up needing refcounts, a new "local" 
lock or something).

Sure, moving to a per-vma lock can help, but it doesn't help a lot. It 
doesn't help AT ALL for the single-threaded case, and for the 
multi-threaded case I will bet you that a _lot_ of cases will have one 
very hot vma - the regular data vma that gets shared for normal malloc() 
etc. 

So I'm personally rather doubtful about the whole speculative work. It's a 
fair amount of complexity without any really obvious upside. Yes, the 
mmap_sem can be very annoying, but nobody can really honestly claim that 
we've really optimized it all that much.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/