[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100105143046.73938ea2.kamezawa.hiroyu@jp.fujitsu.com>
Date: Tue, 5 Jan 2010 14:30:46 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Minchan Kim <minchan.kim@...il.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>, cl@...ux-foundation.org,
"hugh.dickins" <hugh.dickins@...cali.co.uk>,
Nick Piggin <nickpiggin@...oo.com.au>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()
On Mon, 4 Jan 2010 21:10:29 -0800 (PST)
Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
>
> On Tue, 5 Jan 2010, KAMEZAWA Hiroyuki wrote:
> >
> > Then, my patch dropped speculative trial of page fault and did synchronous
> > job here. I'm still considering how to insert some barrier to delay calling
> > remove_vma() until all page fault goes. One idea was reference count but
> > it was said not-enough crazy.
>
> What lock would you use to protect the vma lookup (in order to then
> increase the refcount)? A sequence lock with RCU lookup of the vma?
>
Ah, I just used reference counter to show "how many threads are in
page fault to this vma now". Below is from my post.
==
+ rb_node = rcu_dereference(rb_node->rb_left);
+ } else
+ rb_node = rcu_dereference(rb_node->rb_right);
+ }
+ if (vma) {
+ if ((vma->vm_start <= addr) && (addr < vma->vm_end)) {
+ if (!atomic_inc_not_zero(&vma->refcnt))
+ vma = NULL;
+ } else
+ vma = NULL;
+ }
+ rcu_read_unlock();
...
+void vma_put(struct vm_area_struct *vma)
+{
+ if ((atomic_dec_return(&vma->refcnt) == 1) &&
+ waitqueue_active(&vma->wait_queue))
+ wake_up(&vma->wait_queue);
+ return;
+}
==
And wait for this reference count to be good number before calling
remove_vma()
==
+/* called when vma is unlinked and wait for all racy access.*/
+static void invalidate_vma_before_free(struct vm_area_struct *vma)
+{
+ atomic_dec(&vma->refcnt);
+ wait_event(vma->wait_queue, !atomic_read(&vma->refcnt));
+}
+
....
* us to remove next before dropping the locks.
*/
__vma_unlink(mm, next, vma);
+ invalidate_vma_before_free(next);
if (file)
__remove_shared_vm_struct(next, file, mapping);
etc....
==
Above codes are a bit heavy(and buggy). I have some fixes.
> Sounds doable. But it also sounds way more expensive than the current VM
> fault handling, which is pretty close to optimal for single-threaded
> cases.. That RCU lookup might be cheap, but just the refcount is generally
> going to be as expensive as a lock.
>
For single-threaded apps, my patch will have no benefits.
(but will not make anything worse.)
I'll add CONFIG and I wonder I can enable speculave_vma_lookup
only after mm_struct is shared.(but the patch may be messy...)
> Are there some particular mappings that people care about more than
> others? If we limit the speculative lookup purely to anonymous memory,
> that might simplify the problem space?
>
I wonder, for usual people who don't write highly optimized programs,
some small benefit of skipping mmap_sem is to reduce mmap_sem() ping-pong
after doing fork()->exec(). This can cause some jitter to the application.
So, I'm glad if I can help file-backed vmas.
> [ From past experiences, I suspect DB people would be upset and really
> want it for the general file mapping case.. But maybe the main usage
> scenario is something else this time? ]
>
I'd like to hear use cases of really heavy users, too. Christoph ?
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists