linux-kernel - Re: race with page_referenced_one->ptep_test_and_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0710051601430.14565@blonde.wat.veritas.com>
Date:	Fri, 5 Oct 2007 16:33:16 +0100 (BST)
From:	Hugh Dickins <hugh@...itas.com>
To:	Keir Fraser <keir@...source.com>
cc:	Jeremy Fitzhardinge <jeremy@...p.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Zachary Amsden <zach@...are.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Jan Beulich <jbeulich@...ell.com>, Andi Kleen <ak@...e.de>,
	Ken Chen <kenchen@...gle.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: race with page_referenced_one->ptep_test_and_clear_young and
 pagetable setup/pulldown

On Fri, 5 Oct 2007, Keir Fraser wrote:
> On 5/10/07 10:05, "Jeremy Fitzhardinge" <jeremy@...p.org> wrote:
> > Andi says:
> >> Do I misread that patch or does it really walk the complete address
> >> space and try to take all possible locks? Isn't that very slow?
> > 
> > That's pretty much what it has to do.  Pinning/unpinning walks the whole
> > pagetable anyway, so it shouldn't be much more expensive.  And they're
> > relatively rare operations (fork, exec, exit).
> 
> It is a shame to do 3x walks per pin or unpin, rather than 1x, though.
> 
> One way to improve this, possibly, is to pin the pte tables individually as
> you go, rather than doing one big pin/unpin just at the root pgd. Then you
> can lock/unlock the pte's as you go. I'd suggest that as a possible post
> 2.6.23 improvement, however. Jan's patch has actually had some testing.

A few points come to mind looking at Jan's patch:

The comment about nested pagetable locks is wrong: mm/mremap.c does
nest pagetable locks; but you wouldn't (as I understand it) be doing
this pinning/unpinning anywhere which could race with an mremap() on
the same mm, so that's a niggle about a comment, not a real issue.

Yes, it would be greatly preferable to take the locks one by one
as needed.  As it stands, I think you're in danger of overflowing
the PREEMPT_BITS 8 area of preempt_count(), venturing into the
SOFTIRQ area: I don't know the real-life consequence of that.

I don't see any protection against hugetlb areas, where the pmd
entry may indicate a hugetlb page rather than a pagetable page.
I guess you'll be needing to test pte_huge().  I don't know if
you want to lock those or skip them: locking is usually just
with page_table_lock, but beware there's also sharing of huge
page pmds between mms - Ken Chen should be able to help on that.

If a 2.6.23 fix is needed, I suggest simply excluding split ptlocks
in the Xen case, as shown by the mm/Kconfig - line in Jan's patch.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/