[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1261996258.7135.67.camel@laptop>
Date: Mon, 28 Dec 2009 11:30:58 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"minchan.kim@...il.com" <minchan.kim@...il.com>,
cl@...ux-foundation.org
Subject: Re: [RFC PATCH] asynchronous page fault.
On Mon, 2009-12-28 at 18:58 +0900, KAMEZAWA Hiroyuki wrote:
> Peter Zijlstra さんは書きました:
> > On Mon, 2009-12-28 at 09:36 +0900, KAMEZAWA Hiroyuki wrote:
> >>
> >> > The idea is to let the RCU lock span whatever length you need the vma
> >> > for, the easy way is to simply use PREEMPT_RCU=y for now,
> >>
> >> I tried to remove his kind of reference count trick but I can't do that
> >> without synchronize_rcu() somewhere in unmap code. I don't like that and
> >> use this refcnt.
> >
> > Why, because otherwise we can access page tables for an already unmapped
> > vma? Yeah that is the interesting bit ;-)
> >
> Without that
> vma->a_ops->fault()
> and
> vma->a_ops->unmap()
> can be called at the same time. and vma->vm_file can be dropped while
> vma->a_ops->fault() is called. etc...
Right, so acquiring the PTE lock will either instantiate page tables for
a non-existing vma, leaving you with an interesting mess to clean up, or
you can also RCU free the page tables (in the same RCU domain as the
vma) which will mostly[*] avoid that issue.
[ To make live really really interesting you could even re-use the
page-tables and abort the RCU free when the region gets re-mapped
before the RCU callbacks happen, this will avoid a free/alloc cycle
for fast remapping workloads. ]
Once you hold the PTE lock, you can validate the vma you looked up,
since ->unmap() syncs against it. If at that time you find the
speculative vma is dead, you fail and re-try the fault.
[*] there still is the case of faulting on an address that didn't
previously have page-tables hence the unmap page table scan will have
skipped it -- my hacks simply leaked page tables here, but the idea was
to acquire the mmap_sem for reading and cleanup properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists