linux-kernel - Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3bae9c8b-d723-4a08-8579-fa6f3ea508e8@redhat.com>
Date: Fri, 5 Apr 2024 09:07:16 +0200
From: David Hildenbrand <david@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 linux-s390@...r.kernel.org, kvm@...r.kernel.org,
 Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
 Alexander Gordeev <agordeev@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>, Janosch Frank <frankja@...ux.ibm.com>,
 Claudio Imbrenda <imbrenda@...ux.ibm.com>,
 Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
 Thomas Huth <thuth@...hat.com>
Subject: Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and
 PG_arch_1

On 05.04.24 05:42, Matthew Wilcox wrote:
> On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote:
>> On my journey to remove page_mapcount(), I got hooked up on other folio
>> cleanups that Willy most certainly will enjoy.
>>
>> This series removes the s390x usage of:
>> * page_mapcount() [patches WIP]
>> * page_has_private() [have patches to remove it]
>>
>> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
>> pages of large folios).
>>
>> Further, one "easy" fix upfront.
> 
> Looks like you didn't see:
> 
> https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@infradead.org/

Yes, I only skimmed linux-mm.

I think s390x certainly wants to handle PTE-mapped THP in that code, I 
think there are ways to trigger that, we're just mostly lucky that it 
doesn't happen in the common case.

But thinking about it, the current page_mapcount() based check could not 
possibly have worked for them and rejected any PTE-mapped THP.

So I can just base my changes on top of yours (we might want to get the 
first fix in ahead of time).

> 
>> ... unfortunately there is one other issue I spotted that I am not
>> tackling in this series, because I am not 100% sure what we want to
>> do: the usage of page_ref_freeze()/folio_ref_freeze() in
>> make_folio_secure() is unsafe. :(
>>
>> In make_folio_secure(), we're holding the folio lock, the mmap lock and
>> the PT lock. So we are protected against concurrent fork(), zap, GUP,
>> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
>> also block concurrent GUP-fast very reliably.
>>
>> But if the folio is mapped into multiple page tables, we could see
>> concurrent zapping of the folio, a pagecache folios could get mapped/
>> accessed concurrent, we could see fork() sharing the page in another
>> process, GUP ... trying to adjust the folio refcount while we froze it.
>> Very bad.
> 
> Hmmm.  Why is that not then a problem for, eg, splitting or migrating?
> Is it because they unmap first and then try to freeze?

Yes, exactly. Using mapcount in combination with ref freezing is 
problematic. Except maybe for anonymous folios with mapcount=1, while 
holding a bunch of locks to stop anybody from stumbling over that.

-- 
Cheers,

David / dhildenb