[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0805141114410.3019@woody.linux-foundation.org>
Date: Wed, 14 May 2008 11:27:14 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Christoph Lameter <clameter@....com>
cc: Robin Holt <holt@....com>, Nick Piggin <npiggin@...e.de>,
Nick Piggin <nickpiggin@...oo.com.au>,
Andrea Arcangeli <andrea@...ranet.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jack Steiner <steiner@....com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
kvm-devel@...ts.sourceforge.net,
Kanoj Sarcar <kanojsarcar@...oo.com>,
Roland Dreier <rdreier@...co.com>,
Steve Wise <swise@...ngridcomputing.com>,
linux-kernel@...r.kernel.org, Avi Kivity <avi@...ranet.com>,
linux-mm@...ck.org, general@...ts.openfabrics.org,
Hugh Dickins <hugh@...itas.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Anthony Liguori <aliguori@...ibm.com>,
Chris Wright <chrisw@...hat.com>,
Marcelo Tosatti <marcelo@...ck.org>,
Eric Dumazet <dada1@...mosbay.com>,
"Paul E. McKenney" <paulmck@...ibm.com>
Subject: Re: [PATCH 08 of 11] anon-vma-rwsem
On Wed, 14 May 2008, Christoph Lameter wrote:
>
> The problem is that the code in rmap.c try_to_umap() and friends loops
> over reverse maps after taking a spinlock. The mm_struct is only known
> after the rmap has been acccessed. This means *inside* the spinlock.
So you queue them. That's what we do with things like the dirty bit. We
need to hold various spinlocks to look up pages, but then we can't
actually call the filesystem with the spinlock held.
Converting a spinlock to a waiting lock for things like that is simply not
acceptable. You have to work with the system.
Yeah, there's only a single bit worth of information on whether a page is
dirty or not, so "queueing" that information is trivial (it's just the
return value from "page_mkclean_file()". Some things are harder than
others, and I suspect you need some kind of "gather" structure to queue up
all the vma's that can be affected.
But it sounds like for the case of rmap, the approach of:
- the page lock is the higher-level "sleeping lock" (which makes sense,
since this is very close to an IO event, and that is what the page lock
is generally used for)
But hey, it could be anything else - maybe you have some other even
bigger lock to allow you to handle lots of pages in one go.
- with that lock held, you do the whole rmap dance (which requires
spinlocks) and gather up the vma's and the struct mm's involved.
- outside the spinlocks you then do whatever it is you need to do.
This doesn't sound all that different from TLB shoot-down in SMP, and the
"mmu_gather" structure. Now, admittedly we can do the TLB shoot-down while
holding the spinlocks, but if we couldn't that's how we'd still do it:
it would get more involved (because we'd need to guarantee that the gather
can hold *all* the pages - right now we can just flush in the middle if we
need to), but it wouldn't be all that fundamentally different.
And no, I really haven't even wanted to look at what XPMEM really needs to
do, so maybe the above thing doesn't work for you, and you have other
issues. I'm just pointing you in a general direction, not trying to say
"this is exactly how to get there".
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists