[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50863609fb8263f3a0f9111a304a9dbc.squirrel@webmail-b.css.fujitsu.com>
Date: Mon, 28 Dec 2009 19:57:25 +0900 (JST)
From: "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>
To: "Peter Zijlstra" <peterz@...radead.org>
Cc: "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"minchan.kim@...il.com" <minchan.kim@...il.com>,
cl@...ux-foundation.org
Subject: Re: [RFC PATCH] asynchronous page fault.
Peter Zijlstra wrote:
> On Mon, 2009-12-28 at 18:58 +0900, KAMEZAWA Hiroyuki wrote:
>> Peter Zijlstra wrote:
>> > On Mon, 2009-12-28 at 09:36 +0900, KAMEZAWA Hiroyuki wrote:
>> >>
>> >> > The idea is to let the RCU lock span whatever length you need the
>> vma
>> >> > for, the easy way is to simply use PREEMPT_RCU=y for now,
>> >>
>> >> I tried to remove his kind of reference count trick but I can't do
>> that
>> >> without synchronize_rcu() somewhere in unmap code. I don't like that
>> and
>> >> use this refcnt.
>> >
>> > Why, because otherwise we can access page tables for an already
>> unmapped
>> > vma? Yeah that is the interesting bit ;-)
>> >
>> Without that
>> vma->a_ops->fault()
>> and
>> vma->a_ops->unmap()
>> can be called at the same time. and vma->vm_file can be dropped while
>> vma->a_ops->fault() is called. etc...
>
> Right, so acquiring the PTE lock will either instantiate page tables for
> a non-existing vma, leaving you with an interesting mess to clean up, or
> you can also RCU free the page tables (in the same RCU domain as the
> vma) which will mostly[*] avoid that issue.
>
> [ To make live really really interesting you could even re-use the
> page-tables and abort the RCU free when the region gets re-mapped
> before the RCU callbacks happen, this will avoid a free/alloc cycle
> for fast remapping workloads. ]
>
> Once you hold the PTE lock, you can validate the vma you looked up,
> since ->unmap() syncs against it. If at that time you find the
> speculative vma is dead, you fail and re-try the fault.
>
My previous one did similar but still used vma->refcnt. I'll consider again.
> [*] there still is the case of faulting on an address that didn't
> previously have page-tables hence the unmap page table scan will have
> skipped it -- my hacks simply leaked page tables here, but the idea was
> to acquire the mmap_sem for reading and cleanup properly.
>
Hmm, thank you for hints.
But this current version implementation has some reasons.
- because pmd has some trobles because of quicklists..I don't wanted to
touch free routine of them.
- pmd can be removed asynchronously while page fault is going on.
- I'd like to avoid modification to free_pte_range etc...
I feel pmd/page-table-lock is a hard to handle object than expected.
I'll consider some about per-thread approach or split vma approach
or scalable range lock or some synchronization without heavy atomic op.
Anyway, I think I show something can be done without mmap_sem modification.
See you next year.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists