[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34bafd06-250a-4019-8b34-5ddeedea1cb3@kernel.org>
Date: Mon, 24 Nov 2025 20:22:56 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Zi Yan <ziy@...dia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
Miaohe Lin <linmiaohe@...wei.com>, Naoya Horiguchi
<nao.horiguchi@...il.com>, Wei Yang <richard.weiyang@...il.com>,
Balbir Singh <balbirs@...dia.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/4] mm/huge_memory: replace can_split_folio() with
direct refcount calculation
On 11/24/25 18:05, Zi Yan wrote:
> On 24 Nov 2025, at 5:41, David Hildenbrand (Red Hat) wrote:
>
>> On 11/22/25 03:55, Zi Yan wrote:
>>> can_split_folio() is just a refcount comparison, making sure only the
>>> split caller holds an extra pin. Open code it with
>>> folio_expected_ref_count() != folio_ref_count() - 1. For the extra_pins
>>> used by folio_ref_freeze(), add folio_cache_references() to calculate it.
>>>
>>> Suggested-by: David Hildenbrand (Red Hat) <david@...nel.org>
>>> Signed-off-by: Zi Yan <ziy@...dia.com>
>>> ---
>>> include/linux/huge_mm.h | 1 -
>>> mm/huge_memory.c | 43 ++++++++++++++++-------------------------
>>> mm/vmscan.c | 3 ++-
>>> 3 files changed, 19 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 97686fb46e30..1ecaeccf39c9 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -369,7 +369,6 @@ enum split_type {
>>> SPLIT_TYPE_NON_UNIFORM,
>>> };
>>> -bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
>>> int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> unsigned int new_order);
>>> int folio_split_unmapped(struct folio *folio, unsigned int new_order);
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index c1f1055165dd..6c821c1c0ac3 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3455,23 +3455,6 @@ static void lru_add_split_folio(struct folio *folio, struct folio *new_folio,
>>> }
>>> }
>>> -/* Racy check whether the huge page can be split */
>>> -bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
>>> -{
>>> - int extra_pins;
>>> -
>>> - /* Additional pins from page cache */
>>> - if (folio_test_anon(folio))
>>> - extra_pins = folio_test_swapcache(folio) ?
>>> - folio_nr_pages(folio) : 0;
>>> - else
>>> - extra_pins = folio_nr_pages(folio);
>>> - if (pextra_pins)
>>> - *pextra_pins = extra_pins;
>>> - return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins -
>>> - caller_pins;
>>> -}
>>> -
>>> static bool page_range_has_hwpoisoned(struct page *page, long nr_pages)
>>> {
>>> for (; nr_pages; page++, nr_pages--)
>>> @@ -3776,17 +3759,26 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
>>> return 0;
>>> }
>>> +/* Number of folio references from the pagecache or the swapcache. */
>>> +static unsigned int folio_cache_references(const struct folio *folio)
>>> +{
>>> + if (folio_test_anon(folio) && !folio_test_swapcache(folio))
>>> + return 0;
>>> + return folio_nr_pages(folio);
>>> +}
>>> +
>>> static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int new_order,
>>> struct page *split_at, struct xa_state *xas,
>>> struct address_space *mapping, bool do_lru,
>>> struct list_head *list, enum split_type split_type,
>>> - pgoff_t end, int *nr_shmem_dropped, int extra_pins)
>>> + pgoff_t end, int *nr_shmem_dropped)
>>> {
>>> struct folio *end_folio = folio_next(folio);
>>> struct folio *new_folio, *next;
>>> int old_order = folio_order(folio);
>>> int ret = 0;
>>> struct deferred_split *ds_queue;
>>> + int extra_pins = folio_cache_references(folio);
>>
>> Can we just inline the call do folio_cache_references() and get rid of extra_pins.
>> (which is a bad name either way)
>>
>>
>> if (folio_ref_freeze(folio, folio_cache_references(folio) + 1) {
>>
>>
>> BTW, now that we have this helper, I wonder if we should then also do for
>> clarification on the unfreeze path:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 0acdc2f26ee0c..7cbcf61b7971d 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3824,8 +3824,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>> zone_device_private_split_cb(folio, new_folio);
>> - expected_refs = folio_expected_ref_count(new_folio) + 1;
>> - folio_ref_unfreeze(new_folio, expected_refs);
>> + folio_ref_unfreeze(new_folio, folio_cache_references(new_folio) + 1);
>> if (do_lru)
>> lru_add_split_folio(folio, new_folio, lruvec, list);
>> @@ -3868,8 +3867,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>> * Otherwise, a parallel folio_try_get() can grab @folio
>> * and its caller can see stale page cache entries.
>> */
>> - expected_refs = folio_expected_ref_count(folio) + 1;
>> - folio_ref_unfreeze(folio, expected_refs);
>> + folio_ref_unfreeze(folio, folio_cache_references(folio) + 1);
>> if (do_lru)
>> unlock_page_lruvec(lruvec);
>>
>>
>
> Both make sense to me. Will make the change.
>
> By comparing folio_cache_references() with folio_expected_ref_count(),
> one difference is that folio_expected_ref_count() does not give right
> refcount for shmem in swapcache.
Good point. Likely nobody runs into that right now because nobody can
really does anything with these folios before they were re-added to the
pagecache or mapped into page tables.
>
> This is the folio_expected_ref_count() code:
>
> if (folio_test_anon(folio)) {
> /* One reference per page from the swapcache. */
> ref_count += folio_test_swapcache(folio) << order;
> } else {
> /* One reference per page from the pagecache. */
> ref_count += !!folio->mapping << order;
> /* One reference from PG_private. */
> ref_count += folio_test_private(folio);
> }
>
> shmem in swapcache mean !folio_test_anon(folio) && folio_test_swapcache(folio).
See below, it's actually
folio_test_anon(folio) && folio_test_swapbacked(folio)&&
folio_test_swapcache(folio)
I think ...
> The above code gives 0, but folio_cache_references() gives folio_nr_pages(folio).
> It should not cause any issue, since IIUC shmem in swapcache happens
> when the folio has an additional ref,
> folio_expected_ref_count() != folio_ref_count() anyway. For split, it is
> not supported yet,
Right.
> so folio_expected_ref_count() in split code does not
> affect shmem in swapcache. But folio_expected_ref_count() should be
> fixed, right?
We should better handle it, agreed.
Staring at the history of folio_expected_ref_count() once again, back
when we had folio_expected_refs() in migration code we didn't seem to
handle it I think.
-static int folio_expected_refs(struct address_space *mapping,
- struct folio *folio)
-{
- int refs = 1;
- if (!mapping)
- return refs;
-
- refs += folio_nr_pages(folio);
- if (folio_test_private(folio))
- refs++;
-
- return refs;
-}
gup.c doesn't care, because the pages are still mapped.
khugepaged.c similarly.
memfd.c doesn't care because the pages are still in the pagecache.
So I suspect nothing is broken, but the migration case needs a second look.
>
> Like:
>
> if (folio_test_anon(folio)) {
> /* One reference per page from the swapcache. */
> ref_count += folio_test_swapcache(folio) << order;
> } else {
> /* One reference per page from shmem in the swapcache. */
> ref_count += folio_test_swapcache(folio) << order;
> /* One reference per page from the pagecache. */
> ref_count += !!folio->mapping << order;
> /* One reference from PG_private. */
> ref_count += folio_test_private(folio);
> }
>
> or simplified into
>
> if (!folio_test_anon(folio)) {
> /* One reference per page from the pagecache. */
> ref_count += !!folio->mapping << order;
> /* One reference from PG_private. */
> ref_count += folio_test_private(folio);
> }
> /* One reference per page from the swapcache (anon or shmem). */
> ref_count += folio_test_swapcache(folio) << order;
> ?
That is incorrect I think due to swapcache being able to give false
positives (PG_owner_priv_1).
--
Cheers
David
Powered by blists - more mailing lists