lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e069441-0bc6-4799-9176-c7a76c51158f@redhat.com>
Date: Mon, 1 Sep 2025 10:04:38 +0200
From: David Hildenbrand <david@...hat.com>
To: Hugh Dickins <hughd@...gle.com>, Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Will Deacon <will@...nel.org>,
 Shivank Garg <shivankg@....com>, Christoph Hellwig <hch@...radead.org>,
 Keir Fraser <keirf@...gle.com>, Jason Gunthorpe <jgg@...pe.ca>,
 John Hubbard <jhubbard@...dia.com>, Frederick Mayle <fmayle@...gle.com>,
 Peter Xu <peterx@...hat.com>, "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Vlastimil Babka <vbabka@...e.cz>,
 Alexander Krabler <Alexander.Krabler@...a.com>, Ge Yang
 <yangge1116@....com>, Li Zhe <lizhe.67@...edance.com>,
 Chris Li <chrisl@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
 Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>,
 Wei Xu <weixugc@...gle.com>, Konstantin Khlebnikov <koct9i@...il.com>,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 1/7] mm: fix folio_expected_ref_count() when PG_private_2

On 01.09.25 09:52, David Hildenbrand wrote:
> On 01.09.25 03:17, Hugh Dickins wrote:
>> On Mon, 1 Sep 2025, Matthew Wilcox wrote:
>>> On Sun, Aug 31, 2025 at 02:01:16AM -0700, Hugh Dickins wrote:
>>>> 6.16's folio_expected_ref_count() is forgetting the PG_private_2 flag,
>>>> which (like PG_private, but not in addition to PG_private) counts for
>>>> 1 more reference: it needs to be using folio_has_private() in place of
>>>> folio_test_private().
>>>
>>> No, it doesn't.  I know it used to, but no filesystem was actually doing
>>> that.  So I changed mm to match how filesystems actually worked.
>>> I'm not sure if there's still documentation lying around that gets
>>> this wrong or if you're remembering how things used to be documented,
>>> but it's never how any filesystem has ever worked.
>>>
>>> We're achingly close to getting rid of PG_private_2.  I think it's just
>>> ceph and nfs that still use it.
>>
>> I knew you were trying to get rid of it (hurrah! thank you), so when I
>> tried porting my lru_add_drainage to 6.12 I was careful to check whether
>> folio_expected_ref_count() would need to add it to the accounting there:
>> apparently yes; but then I was surprised to find that it's still present
>> in 6.17-rc, I'd assumed it gone long ago.
>>
>> I didn't try to read the filesystems (which could easily have been
>> inconsistent about it) to understand: what convinced me amidst all
>> the confusion was this comment and code in mm/filemap.c:
>>
>> /**
>>    * folio_end_private_2 - Clear PG_private_2 and wake any waiters.
>>    * @folio: The folio.
>>    *
>>    * Clear the PG_private_2 bit on a folio and wake up any sleepers waiting for
>>    * it.  The folio reference held for PG_private_2 being set is released.
>>    *
>>    * This is, for example, used when a netfs folio is being written to a local
>>    * disk cache, thereby allowing writes to the cache for the same folio to be
>>    * serialised.
>>    */
>> void folio_end_private_2(struct folio *folio)
>> {
>> 	VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio);
>> 	clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
>> 	folio_wake_bit(folio, PG_private_2);
>> 	folio_put(folio);
>> }
>> EXPORT_SYMBOL(folio_end_private_2);
>>
>> That seems to be clear that PG_private_2 is matched by a folio reference,
>> but perhaps you can explain it away - worth changing the comment if so.
>>
>> I was also anxious to work out whether PG_private with PG_private_2
>> would mean +1 or +2: I don't think I found any decisive statement,
>> but traditional use of page_has_private() implied +1; and I expect
>> there's no filesystem which actually could have both on the same folio.
> 
> I think it's "+1", like we used to have.
> 
> I was seriously confused when discovering (iow, concerned about false
> positives):
> 
> 	PG_fscache = PG_private_2,
> 
> But in the end PG_fscache is only used in comments and e.g.,
> __fscache_clear_page_bits() calls folio_end_private_2(). So both are
> really just aliases.
> 
> [Either PG_fscache should be dropped and referred to as PG_private_2, or
> PG_private_2 should be dropped and PG_fscache used instead. It's even
> inconsistently used in that fscache. file.
> 
> Or both should be dropped, of course, once we can actually get rid of it
> ...]
> 
> So PG_private_2 should not be used for any other purpose.
> 
> folio_start_private_2() / folio_end_private_2() indeed pair the flag
> with a reference. There are no other callers that would set/clear the
> flag without involving a reference.
> 
> The usage of private_2 is declared deprecated all over the place. So the
> question is if we really still care.
> 
> The ceph usage is guarded by CONFIG_CEPH_FSCACHE, the NFS one by
> NFS_FSCACHE, nothing really seems to prevent it from getting configured
> in easily.
> 
> Now, one problem would be if migration / splitting / ... code where we
> use folio_expected_ref_count() cannot deal with that additional
> reference properly, in which case this patch would indeed cause harm.
> 
> If all folio_expected_ref_count() callers can deal with updating that
> reference, all good.
> 
> nfs_migrate_folio(), for example, has folio_test_private_2() handling in
> there (just wait until it is gone). ceph handles it during
> ceph_writepages_start(), but uses ordinary filemap_migrate_folio.
> 
> Long story short: this patch is problematic if one
> folio_expected_ref_count() users is not aware of how to handle that
> additional reference.
> 

Case in point, I just stumbled over

commit 682a71a1b6b363bff71440f4eca6498f827a839d
Author: Matthew Wilcox (Oracle) <willy@...radead.org>
Date:   Fri Sep 2 20:46:46 2022 +0100

     migrate: convert __unmap_and_move() to use folios

and

commit 8faa8ef5dd11abe119ad0c8ccd39f2064ca7ed0e
Author: Matthew Wilcox (Oracle) <willy@...radead.org>
Date:   Mon Jun 6 09:34:36 2022 -0400

     mm/migrate: Convert fallback_migrate_page() to fallback_migrate_folio()
     
     Use a folio throughout.  migrate_page() will be converted to
     migrate_folio() later.


where we converted from page_has_private() to folio_test_private(). Maybe
that's all sane, but it raises the question if migration (and maybe splitting)
as a whole is no incompatible with PG_private_2

-- 
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ