[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a80932ef-d029-c82e-d171-ab8bdac8cbdc@suse.cz>
Date: Fri, 4 Nov 2022 16:57:51 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Matthew Wilcox <willy@...radead.org>,
Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc: Hugh Dickins <hughd@...gle.com>,
David Laight <David.Laight@...lab.com>,
Joel Fernandes <joel@...lfernandes.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
rcu@...r.kernel.org
Subject: Re: amusing SLUB compaction bug when CC_OPTIMIZE_FOR_SIZE
On 10/24/22 16:35, Vlastimil Babka wrote:
> On 10/3/22 19:00, Matthew Wilcox wrote:
>> On Sun, Oct 02, 2022 at 02:48:02PM +0900, Hyeonggon Yoo wrote:
>>> Just one more thing, rcu_leak_callback too. RCU seem to use it
>>> internally to catch double call_rcu().
>>>
>>> And some suggestions:
>>> - what about adding runtime WARN() on slab init code to catch
>>> unexpected arch/toolchain issues?
>>> - instead of 4, we may use macro definition? like (PAGE_MAPPING_FLAGS + 1)?
>>
>> I think the real problem here is that isolate_movable_page() is
>> insufficiently paranoid. Looking at the gyrations that GUP and the
>> page cache do to convince themselves that the page they got really is
>> the page they wanted, there are a few missing pieces (eg checking that
>> you actually got a refcount on _this_ page and not some random other
>> page you were temporarily part of a compound page with).
>>
>> This patch does three things:
>>
>> - Turns one of the comments into English. There are some others
>> which I'm still scratching my head over.
>> - Uses a folio to help distinguish which operations are being done
>> to the head vs the specific page (this is somewhat an abuse of the
>> folio concept, but it's acceptable)
>> - Add the aforementioned check that we're actually operating on the
>> page that we think we want to be.
>> - Add a check that the folio isn't secretly a slab.
>>
>> We could put the slab check in PageMapping and call it after taking
>> the folio lock, but that seems pointless. It's the acquisition of
>> the refcount which stabilises the slab flag, not holding the lock.
>>
>
> I would like to have a working safe version in -next, even if we are able
> simplify it later thanks to frozen refcounts. I've made a formal patch of
> yours, but I'm still convinced the slab check needs to be more paranoid so
> it can't observe a false positive __folio_test_movable() while missing the
> folio_test_slab(), hence I added the barriers as in my previous attempt [1].
> Does that work for you and can I add your S-o-b?
>
> [1] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/
To move on, I pushed a branch based on a new version of [1] above. It
lacks Matthew's folio parts, which are not IMHO that critical right now,
so can be added later.
It's here:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.2/fit_rcu_head
Will also send for formal review soon.
Powered by blists - more mailing lists