[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <92def216-ca9c-402d-8643-226592ca1a85@redhat.com>
Date: Mon, 1 Sep 2025 09:52:16 +0200
From: David Hildenbrand <david@...hat.com>
To: Hugh Dickins <hughd@...gle.com>, Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Will Deacon <will@...nel.org>,
Shivank Garg <shivankg@....com>, Christoph Hellwig <hch@...radead.org>,
Keir Fraser <keirf@...gle.com>, Jason Gunthorpe <jgg@...pe.ca>,
John Hubbard <jhubbard@...dia.com>, Frederick Mayle <fmayle@...gle.com>,
Peter Xu <peterx@...hat.com>, "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Vlastimil Babka <vbabka@...e.cz>,
Alexander Krabler <Alexander.Krabler@...a.com>, Ge Yang
<yangge1116@....com>, Li Zhe <lizhe.67@...edance.com>,
Chris Li <chrisl@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>,
Wei Xu <weixugc@...gle.com>, Konstantin Khlebnikov <koct9i@...il.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 1/7] mm: fix folio_expected_ref_count() when PG_private_2
On 01.09.25 03:17, Hugh Dickins wrote:
> On Mon, 1 Sep 2025, Matthew Wilcox wrote:
>> On Sun, Aug 31, 2025 at 02:01:16AM -0700, Hugh Dickins wrote:
>>> 6.16's folio_expected_ref_count() is forgetting the PG_private_2 flag,
>>> which (like PG_private, but not in addition to PG_private) counts for
>>> 1 more reference: it needs to be using folio_has_private() in place of
>>> folio_test_private().
>>
>> No, it doesn't. I know it used to, but no filesystem was actually doing
>> that. So I changed mm to match how filesystems actually worked.
>> I'm not sure if there's still documentation lying around that gets
>> this wrong or if you're remembering how things used to be documented,
>> but it's never how any filesystem has ever worked.
>>
>> We're achingly close to getting rid of PG_private_2. I think it's just
>> ceph and nfs that still use it.
>
> I knew you were trying to get rid of it (hurrah! thank you), so when I
> tried porting my lru_add_drainage to 6.12 I was careful to check whether
> folio_expected_ref_count() would need to add it to the accounting there:
> apparently yes; but then I was surprised to find that it's still present
> in 6.17-rc, I'd assumed it gone long ago.
>
> I didn't try to read the filesystems (which could easily have been
> inconsistent about it) to understand: what convinced me amidst all
> the confusion was this comment and code in mm/filemap.c:
>
> /**
> * folio_end_private_2 - Clear PG_private_2 and wake any waiters.
> * @folio: The folio.
> *
> * Clear the PG_private_2 bit on a folio and wake up any sleepers waiting for
> * it. The folio reference held for PG_private_2 being set is released.
> *
> * This is, for example, used when a netfs folio is being written to a local
> * disk cache, thereby allowing writes to the cache for the same folio to be
> * serialised.
> */
> void folio_end_private_2(struct folio *folio)
> {
> VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio);
> clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
> folio_wake_bit(folio, PG_private_2);
> folio_put(folio);
> }
> EXPORT_SYMBOL(folio_end_private_2);
>
> That seems to be clear that PG_private_2 is matched by a folio reference,
> but perhaps you can explain it away - worth changing the comment if so.
>
> I was also anxious to work out whether PG_private with PG_private_2
> would mean +1 or +2: I don't think I found any decisive statement,
> but traditional use of page_has_private() implied +1; and I expect
> there's no filesystem which actually could have both on the same folio.
I think it's "+1", like we used to have.
I was seriously confused when discovering (iow, concerned about false
positives):
PG_fscache = PG_private_2,
But in the end PG_fscache is only used in comments and e.g.,
__fscache_clear_page_bits() calls folio_end_private_2(). So both are
really just aliases.
[Either PG_fscache should be dropped and referred to as PG_private_2, or
PG_private_2 should be dropped and PG_fscache used instead. It's even
inconsistently used in that fscache. file.
Or both should be dropped, of course, once we can actually get rid of it
...]
So PG_private_2 should not be used for any other purpose.
folio_start_private_2() / folio_end_private_2() indeed pair the flag
with a reference. There are no other callers that would set/clear the
flag without involving a reference.
The usage of private_2 is declared deprecated all over the place. So the
question is if we really still care.
The ceph usage is guarded by CONFIG_CEPH_FSCACHE, the NFS one by
NFS_FSCACHE, nothing really seems to prevent it from getting configured
in easily.
Now, one problem would be if migration / splitting / ... code where we
use folio_expected_ref_count() cannot deal with that additional
reference properly, in which case this patch would indeed cause harm.
If all folio_expected_ref_count() callers can deal with updating that
reference, all good.
nfs_migrate_folio(), for example, has folio_test_private_2() handling in
there (just wait until it is gone). ceph handles it during
ceph_writepages_start(), but uses ordinary filemap_migrate_folio.
Long story short: this patch is problematic if one
folio_expected_ref_count() users is not aware of how to handle that
additional reference.
--
Cheers
David / dhildenb
Powered by blists - more mailing lists