[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080429153052.GE8315@duo.random>
Date: Tue, 29 Apr 2008 17:30:52 +0200
From: Andrea Arcangeli <andrea@...ranet.com>
To: Christoph Lameter <clameter@....com>
Cc: Robin Holt <holt@....com>, Jack Steiner <steiner@....com>,
Nick Piggin <npiggin@...e.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
kvm-devel@...ts.sourceforge.net,
Kanoj Sarcar <kanojsarcar@...oo.com>,
Roland Dreier <rdreier@...co.com>,
Steve Wise <swise@...ngridcomputing.com>,
linux-kernel@...r.kernel.org, Avi Kivity <avi@...ranet.com>,
linux-mm@...ck.org, general@...ts.openfabrics.org,
Hugh Dickins <hugh@...itas.com>, akpm@...ux-foundation.org,
Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [PATCH 01 of 12] Core of mmu notifiers
On Mon, Apr 28, 2008 at 06:28:06PM -0700, Christoph Lameter wrote:
> On Tue, 29 Apr 2008, Andrea Arcangeli wrote:
>
> > Frankly I've absolutely no idea why rcu is needed in all rmap code
> > when walking the page->mapping. Definitely the PG_locked is taken so
> > there's no way page->mapping could possibly go away under the rmap
> > code, hence the anon_vma can't go away as it's queued in the vma, and
> > the vma has to go away before the page is zapped out of the pte.
>
> zap_pte_range can race with the rmap code and it does not take the page
> lock. The page may not go away since a refcount was taken but the mapping
> can go away. Without RCU you have no guarantee that the anon_vma is
> existing when you take the lock.
There's some room for improvement, like using down_read_trylock, if
that succeeds we don't need to increase the refcount and we can keep
the rcu_read_lock held instead.
Secondly we don't need to increase the refcount in fork() when we
queue the vma-copy in the anon_vma. You should init the refcount to 1
when the anon_vma is allocated, remove the atomic_inc from all code
(except when down_read_trylock fails) and then change anon_vma_unlink
to:
up_write(&anon_vma->sem);
if (empty)
put_anon_vma(anon_vma);
While the down_read_trylock surely won't help in AIM, the second
change will reduce a bit the overhead in the VM core fast paths by
avoiding all refcounting changes by checking the list_empty the same
way the current code does. I really like how I designed the garbage
collection through list_empty and that's efficient and I'd like to
keep it.
I however doubt this will bring us back to the same performance of the
current spinlock version, as the real overhead should come out of
overscheduling in down_write ai anon_vma_link. Here an initially
spinning lock would help but that's gray area, it greatly depends on
timings, and on very large systems where a cacheline wait with many
cpus forking at the same time takes more than scheduling a semaphore
may not slowdown performance that much. So I think the only way is a
configuration option to switch the locking at compile time, then XPMEM
will depend on that option to be on, I don't see a big deal and this
guarantees embedded isn't screwed up by totally unnecessary locks on UP.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists