linux-kernel - Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52C7C1AA.2070701@suse.cz>
Date:	Sat, 04 Jan 2014 09:09:14 +0100
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Sasha Levin <sasha.levin@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
	Michel Lespinasse <walken@...gle.com>,
	Bob Liu <bob.liu@...cle.com>, Nick Piggin <npiggin@...e.de>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Mel Gorman <mgorman@...e.de>, Minchan Kim <minchan@...nel.org>,
	Hugh Dickins <hughd@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-mm <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs

On 01/04/2014 01:18 AM, Linus Torvalds wrote:
> On Fri, Jan 3, 2014 at 3:36 PM, Vlastimil Babka <vbabka@...e.cz> wrote:
>>
>> I'm for going with the removal of BUG_ON. The TestSetPageMlocked should provide enough
>> race protection.
> 
> Maybe. But dammit, that's subtle, and I don't think you're even right.
> 
> It basically depends on mlock_vma_page() and munlock_vma_page() being
> able to run CONCURRENTLY on the same page. In particular, you could
> have a mlock_vma_page() set the bit on one CPU, and munlock_vma_page()
> immediately clearing it on another, and then the rest of those
> functions could run with a totally arbitrary interleaving when working
> with the exact same page.
> 
> They both do basically
> 
>     if (!isolate_lru_page(page))
>         putback_lru_page(page);
> 
> but one or the other would randomly win the race (it's internally
> protected by the lru lock), and *if* the munlock_vma_page() wins it,
> it would also do
> 
>     try_to_munlock(page);
> 
> but if mlock_vma_page() wins it, that wouldn't happen. That looks
> entirely broken - you end up with the PageMlocked bit clear, but
> try_to_munlock() was never called on that page, because
> mlock_vma_page() got to the page isolation before the "subsequent"
> munlock_vma_page().

I got the impression (see e.g. munlock_vma_page() comments) that the
whole thing is designed with this possibility in mind. isolate_lru_page()
may fail (presumably also in other scenarios than this) and if
try_to_munlock() was not called here, then yes the page might lose the
PageMlocked bit and go to LRU instead of inevictable list, but
try_to_unmap() should catch and fix this. That would also explain why
mlock_vma_page() is called from try_to_unmap_cluster().
So if I understand correctly, PageMlocked bit is not something that has
to be correctly set 100% of the time, but when it's set correctly most
of the time, then most of these pages will go to inevictable list and spare
vmscan's time.

> And this is very much what the page lock serialization would prevent.
> So no, the PageMlocked in *no* way gives serialization. It's an atomic
> bit op, yes, but that only "serializes" in one direction, not when you
> can have a mix of bit setting and clearing.
> 
> So quite frankly, I think you're wrong. The BUG_ON() is correct, or at
> least enforces some kind of ordering. And try_to_unmap_cluster() is
> just broken in calling that without the page being locked. That's my
> opinion. There may be some *other* reason why it all happens to work,
> but no, "TestSetPageMlocked should provide enough race protection" is
> simply not true, and even if it were, it's way too subtle and odd to
> be a good rule.

Right, it was stupid of me to write such strong statement without any
details. I wanted to review that patch when back at work next week, but
since it came up now, I just wanted to point out that it's in the pipeline
for this bug.

> So I really object to just removing the BUG_ON(). Not with a *lot*
> more explanation as to why these kinds of issues wouldn't matter.
> 
>                  Linus
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/