[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3bae9c8b-d723-4a08-8579-fa6f3ea508e8@redhat.com>
Date: Fri, 5 Apr 2024 09:07:16 +0200
From: David Hildenbrand <david@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-s390@...r.kernel.org, kvm@...r.kernel.org,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Janosch Frank <frankja@...ux.ibm.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Thomas Huth <thuth@...hat.com>
Subject: Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and
PG_arch_1
On 05.04.24 05:42, Matthew Wilcox wrote:
> On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote:
>> On my journey to remove page_mapcount(), I got hooked up on other folio
>> cleanups that Willy most certainly will enjoy.
>>
>> This series removes the s390x usage of:
>> * page_mapcount() [patches WIP]
>> * page_has_private() [have patches to remove it]
>>
>> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
>> pages of large folios).
>>
>> Further, one "easy" fix upfront.
>
> Looks like you didn't see:
>
> https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/
Yes, I only skimmed linux-mm.
I think s390x certainly wants to handle PTE-mapped THP in that code, I
think there are ways to trigger that, we're just mostly lucky that it
doesn't happen in the common case.
But thinking about it, the current page_mapcount() based check could not
possibly have worked for them and rejected any PTE-mapped THP.
So I can just base my changes on top of yours (we might want to get the
first fix in ahead of time).
>
>> ... unfortunately there is one other issue I spotted that I am not
>> tackling in this series, because I am not 100% sure what we want to
>> do: the usage of page_ref_freeze()/folio_ref_freeze() in
>> make_folio_secure() is unsafe. :(
>>
>> In make_folio_secure(), we're holding the folio lock, the mmap lock and
>> the PT lock. So we are protected against concurrent fork(), zap, GUP,
>> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
>> also block concurrent GUP-fast very reliably.
>>
>> But if the folio is mapped into multiple page tables, we could see
>> concurrent zapping of the folio, a pagecache folios could get mapped/
>> accessed concurrent, we could see fork() sharing the page in another
>> process, GUP ... trying to adjust the folio refcount while we froze it.
>> Very bad.
>
> Hmmm. Why is that not then a problem for, eg, splitting or migrating?
> Is it because they unmap first and then try to freeze?
Yes, exactly. Using mapcount in combination with ref freezing is
problematic. Except maybe for anonymous folios with mapcount=1, while
holding a bunch of locks to stop anybody from stumbling over that.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists