[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <200803051152.15160.nickpiggin@yahoo.com.au>
Date: Wed, 5 Mar 2008 11:52:13 +1100
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Christoph Lameter <clameter@....com>
Cc: akpm@...ux-foundation.org, Andrea Arcangeli <andrea@...ranet.com>,
Robin Holt <holt@....com>, Avi Kivity <avi@...ranet.com>,
Izik Eidus <izike@...ranet.com>,
kvm-devel@...ts.sourceforge.net,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
general@...ts.openfabrics.org,
Steve Wise <swise@...ngridcomputing.com>,
Roland Dreier <rdreier@...co.com>,
Kanoj Sarcar <kanojsarcar@...oo.com>, steiner@....com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
daniel.blueman@...drics.com
Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Wednesday 05 March 2008 05:58, Christoph Lameter wrote:
> On Tue, 4 Mar 2008, Nick Piggin wrote:
> > > Then put it into the arch code for TLB invalidation. Paravirt ops gives
> > > good examples on how to do that.
> >
> > Put what into arch code?
>
> The mmu notifier code.
It isn't arch specific.
> > > > What about a completely different approach... XPmem runs over
> > > > NUMAlink, right? Why not provide some non-sleeping way to basically
> > > > IPI remote nodes over the NUMAlink where they can process the
> > > > invalidation? If you intra-node cache coherency has to run over this
> > > > link anyway, then presumably it is capable.
> > >
> > > There is another Linux instance at the remote end that first has to
> > > remove its own ptes.
> >
> > Yeah, what's the problem?
>
> The remote end has to invalidate the page which involves locking etc.
I don't see what the problem is.
> > > Also would not work for Inifiniband and other
> > > solutions.
> >
> > infiniband doesn't want it. Other solutions is just handwaving,
> > because if we don't know what the other soloutions are, then we can't
> > make any sort of informed choices.
>
> We need a solution in general to avoid the pinning problems. Infiniband
> has those too.
>
> > > All the approaches that require evictions in an atomic context
> > > are limiting the approach and do not allow the generic functionality
> > > that we want in order to not add alternate APIs for this.
> >
> > The only generic way to do this that I have seen (and the only proposed
> > way that doesn't add alternate APIs for that matter) is turning VM locks
> > into sleeping locks. In which case, Andrea's notifiers will work just
> > fine (except for relatively minor details like rcu list scanning).
>
> No they wont. As you pointed out the callback need RCU locking.
That can be fixed easily.
> > > The good enough solution right now is to pin pages by elevating
> > > refcounts.
> >
> > Which kind of leads to the question of why do you need any further
> > kernel patches if that is good enough?
>
> Well its good enough with severe problems during reclaim, livelocks etc.
> One could improve on that scheme through Rik's work trying to add a new
> page flag that mark pinned pages and then keep them off the LRUs and
> limiting their number. Having pinned page would limit the ability to
> reclaim by the VM and make page migration, memory unplug etc impossible.
Well not impossible. You could have a callback to invalidate the remote
TLB and drop the pin on a given page.
> It is better to have notifier scheme that allows to tell a device driver
> to free up the memory it has mapped.
Yeah, it would be nice for those people with clusters of Altixes. Doesn't
mean it has to go upstream, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists