linux-kernel - Re: [PATCH 08 of 11] anon-vma-rwsem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0805141114410.3019@woody.linux-foundation.org>
Date:	Wed, 14 May 2008 11:27:14 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Christoph Lameter <clameter@....com>
cc:	Robin Holt <holt@....com>, Nick Piggin <npiggin@...e.de>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Andrea Arcangeli <andrea@...ranet.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jack Steiner <steiner@....com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	kvm-devel@...ts.sourceforge.net,
	Kanoj Sarcar <kanojsarcar@...oo.com>,
	Roland Dreier <rdreier@...co.com>,
	Steve Wise <swise@...ngridcomputing.com>,
	linux-kernel@...r.kernel.org, Avi Kivity <avi@...ranet.com>,
	linux-mm@...ck.org, general@...ts.openfabrics.org,
	Hugh Dickins <hugh@...itas.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Anthony Liguori <aliguori@...ibm.com>,
	Chris Wright <chrisw@...hat.com>,
	Marcelo Tosatti <marcelo@...ck.org>,
	Eric Dumazet <dada1@...mosbay.com>,
	"Paul E. McKenney" <paulmck@...ibm.com>
Subject: Re: [PATCH 08 of 11] anon-vma-rwsem

On Wed, 14 May 2008, Christoph Lameter wrote:
> 
> The problem is that the code in rmap.c try_to_umap() and friends loops 
> over reverse maps after taking a spinlock. The mm_struct is only known 
> after the rmap has been acccessed. This means *inside* the spinlock.

So you queue them. That's what we do with things like the dirty bit. We 
need to hold various spinlocks to look up pages, but then we can't 
actually call the filesystem with the spinlock held.

Converting a spinlock to a waiting lock for things like that is simply not 
acceptable. You have to work with the system.

Yeah, there's only a single bit worth of information on whether a page is 
dirty or not, so "queueing" that information is trivial (it's just the 
return value from "page_mkclean_file()". Some things are harder than 
others, and I suspect you need some kind of "gather" structure to queue up 
all the vma's that can be affected.

But it sounds like for the case of rmap, the approach of:

 - the page lock is the higher-level "sleeping lock" (which makes sense, 
   since this is very close to an IO event, and that is what the page lock 
   is generally used for)

   But hey, it could be anything else - maybe you have some other even 
   bigger lock to allow you to handle lots of pages in one go.

 - with that lock held, you do the whole rmap dance (which requires 
   spinlocks) and gather up the vma's and the struct mm's involved. 

 - outside the spinlocks you then do whatever it is you need to do.

This doesn't sound all that different from TLB shoot-down in SMP, and the 
"mmu_gather" structure. Now, admittedly we can do the TLB shoot-down while 
holding the spinlocks, but if we couldn't that's how we'd still do it: 
it would get more involved (because we'd need to guarantee that the gather 
can hold *all* the pages - right now we can just flush in the middle if we 
need to), but it wouldn't be all that fundamentally different.

And no, I really haven't even wanted to look at what XPMEM really needs to 
do, so maybe the above thing doesn't work for you, and you have other 
issues. I'm just pointing you in a general direction, not trying to say 
"this is exactly how to get there". 

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/