[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0710051601430.14565@blonde.wat.veritas.com>
Date: Fri, 5 Oct 2007 16:33:16 +0100 (BST)
From: Hugh Dickins <hugh@...itas.com>
To: Keir Fraser <keir@...source.com>
cc: Jeremy Fitzhardinge <jeremy@...p.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Zachary Amsden <zach@...are.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Jan Beulich <jbeulich@...ell.com>, Andi Kleen <ak@...e.de>,
Ken Chen <kenchen@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: race with page_referenced_one->ptep_test_and_clear_young and
pagetable setup/pulldown
On Fri, 5 Oct 2007, Keir Fraser wrote:
> On 5/10/07 10:05, "Jeremy Fitzhardinge" <jeremy@...p.org> wrote:
> > Andi says:
> >> Do I misread that patch or does it really walk the complete address
> >> space and try to take all possible locks? Isn't that very slow?
> >
> > That's pretty much what it has to do. Pinning/unpinning walks the whole
> > pagetable anyway, so it shouldn't be much more expensive. And they're
> > relatively rare operations (fork, exec, exit).
>
> It is a shame to do 3x walks per pin or unpin, rather than 1x, though.
>
> One way to improve this, possibly, is to pin the pte tables individually as
> you go, rather than doing one big pin/unpin just at the root pgd. Then you
> can lock/unlock the pte's as you go. I'd suggest that as a possible post
> 2.6.23 improvement, however. Jan's patch has actually had some testing.
A few points come to mind looking at Jan's patch:
The comment about nested pagetable locks is wrong: mm/mremap.c does
nest pagetable locks; but you wouldn't (as I understand it) be doing
this pinning/unpinning anywhere which could race with an mremap() on
the same mm, so that's a niggle about a comment, not a real issue.
Yes, it would be greatly preferable to take the locks one by one
as needed. As it stands, I think you're in danger of overflowing
the PREEMPT_BITS 8 area of preempt_count(), venturing into the
SOFTIRQ area: I don't know the real-life consequence of that.
I don't see any protection against hugetlb areas, where the pmd
entry may indicate a hugetlb page rather than a pagetable page.
I guess you'll be needing to test pte_huge(). I don't know if
you want to lock those or skip them: locking is usually just
with page_table_lock, but beware there's also sharing of huge
page pmds between mms - Ken Chen should be able to help on that.
If a 2.6.23 fix is needed, I suggest simply excluding split ptlocks
in the Xen case, as shown by the mm/Kconfig - line in Jan's patch.
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists