linux-kernel - Re: [PATCH V4 1/2] mm: add folio_migration_expected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57536c5a-23dd-4f14-af35-9c5523000e80@amd.com>
Date: Tue, 29 Apr 2025 16:27:04 +0530
From: Shivank Garg <shivankg@....com>
To: David Hildenbrand <david@...hat.com>, Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, shaggy@...nel.org,
 wangkefeng.wang@...wei.com, jane.chu@...cle.com, ziy@...dia.com,
 donettom@...ux.ibm.com, apopple@...dia.com,
 jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, syzbot+8bb6fd945af4e0ad9299@...kaller.appspotmail.com
Subject: Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline
 function



On 4/25/2025 1:17 PM, David Hildenbrand wrote:
> On 24.04.25 13:57, Shivank Garg wrote:
>> Hi All,
>>
>> Thank you for reviewing my patch and providing feedback.
>>
>> On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
>>> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>>>> On 23.04.25 09:22, David Hildenbrand wrote:
>>>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>>>> +/**
>>>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>>>
>>>>>>> "folio_migration_expected_refs"
>>
>> Thank you for catching this, I'll fix it.
>>
>> I wasn't previously aware of using make W=1 to build kernel-docs and
>> check for warnings - this is very useful information for me.
>>
>> I'll add to changelog to better explain why this is needed for JFS.
>>
>>>>>>
>>>>>> What I do wonder is whether we want to have such a specialised
>>>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>>>> unmapped first.
>>>>>
>>>>> I was debating with myself whether we should do the usual "refs from
>>>>> ->private, refs from page table mappings" .. dance, and look up the
>>>>> mapping from the folio instead of passing it in.
>>>>>
>>>>> I concluded that for this (migration) purpose the function is good
>>>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>>>> still page table mappings), it would not fake that there are no
>>>>> unexpected references.
>>>>
>>>> Sorry, I forgot that we still care about the reference from ->private here.
>>>> We expect the folio to be unmapped.
>>>
>>> Right, so just adding in folio_mapcount() will be a no-op for migration,
>>> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
>>> explain page refocunts to people (and I need to put this in a document
>>> somewhere):
>>>
>>> There are three types of contribution to the refcount:
>>>
>>>   - Expected.  These are deducible from the folio itself, and they're all
>>>     findable.  You need to figure out what the expected number of
>>>     references are to a folio if you're going to try to freeze it.
>>>     These can be references from the mapcount, the page cache, the swap
>>>     cache, the private data, your call chain.
>>>   - Temporary.  Someone else has found the folio somehow; perhaps through
>>>     the page cache, or by calling GUP or something.  They mean you can't
>>>     freeze the folio because you don't know who has the reference or how
>>>     long they might hold it for.
>>>   - Spurious.  This is like a temporary reference, but worse because if
>>>     you read the code, there should be no way for there to be any temporary
>>>     references to the folio.  Someone's found a stale pointer to this
>>>     folio and has bumped the reference count while they check that the
>>>     folio they have is the one they expected to find.  They're going
>>>     to find out that the pointer they followed is stale and put their
>>>     refcount soon, but in the meantime you still can't freeze the folio.
>>>
>>> So I don't love the idea of having a function with the word "expected"
>>> in the name that returns a value which doesn't take into account all
>>> the potential contributors to the expected value.  And sure we can keep
>>> adding qualifiers to the function name to indicate how it is to be used,
>>> but at some point I think we should say "It's OK for this to be a little
>>> less efficient so we can understand what it means".
>>
>> Thank you, Willy, for the detailed explanation about page reference counting.
>> This has helped me understand the concept much better.
>>
>> Based on your explanation and the discussion, I'm summarizing the 2 approaches:
>>
>> 1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
>> to clarify it does not account for other potential contributors.
>> or folio_unmapped_base_refs?
>> 2. Accounting all possible contributors to expected refs:
>> folio_expected_refs(mapping, folio)
>> {   
>>     int refs = 1;
>>
>>     if (mapping) {
>>         if (folio_test_anon(folio))
>>             refs += folio_test_swapcache(folio) ?
>>                 folio_nr_pages(folio) : 0;
>>         else
>>             refs += folio_nr_pages(folio);
>>
>>         if (folio_test_private(folio))
>>             refs++;
>>     }
>>     refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
>>     return refs;
>> }
>>
>> Please let me know if this approach is acceptable or if you have
>> other suggestions for improvement.
> 
> A couple of points:
> 
> 1) Can we name it folio_expected_ref_count()
> 
> 2) Can we avoid passing in the mapping? Might not be expensive to look it
>    up again. Below I avoid calling folio_mapping().
> 
> 3) Can we delegate adding the additional reference to the caller? Will make it
>    easier to use elsewhere (e.g., not additional reference because we are holding
>    the page table lock).
> 
> 4) Can we add kerneldoc, and in particular document the semantics?
> 
> Not sure if we should inline this function or put it into mm/utils.c
> 

Hi David,

Thank you for the detailed suggestions. They all make sense to me.

I did not understand a few changes in your patch below:
> 
> I'm thinking of something like (completely untested):
> 
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a205020e2a58b..a0ad4ed9a75ff 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2112,6 +2112,61 @@ static inline bool folio_maybe_mapped_shared(struct folio *folio)
>      return folio_test_large_maybe_mapped_shared(folio);
>  }
>  
> +/**
> + * folio_expected_ref_count - calculate the expected folio refcount
> + * @folio: the folio
> + *
> + * Calculate the expected folio refcount, taking references from the pagecache,
> + * swapcache, PG_private and page table mappings into account. Useful in
> + * combination with folio_ref_count() to detect unexpected references (e.g.,
> + * GUP or other temporary references).
> + *
> + * Does currently not consider references from the LRU cache. If the folio
> + * was isolated from the LRU (which is the case during migration or split),
> + * the folio was already isolated from the LRU and the LRU cache does not apply.
> + *
> + * Calling this function on an unmapped folio -- !folio_mapped() -- that is
> + * locked will return a stable result.
> + *
> + * Calling this function on a mapped folio will not result in a stable result,
> + * because nothing stops additional page table mappings from coming (e.g.,
> + * fork()) or going (e.g., munmap()).
> + *
> + * Calling this function without the folio lock will also not result in a
> + * stable result: for example, the folio might get dropped from the swapcache
> + * concurrently.
> + *
> + * However, even when called without the folio lock or on a mapped folio,
> + * this function can be used to detect unexpected references early (for example.
> + * if it makes sense to even lock the folio and unmap it).
> + *
> + * The caller must add any reference (e.g., from folio_try_get()) it might be
> + * holding itself to the result.
> + *
> + * Returns the expected folio refcount.
> + */
> +static inline int folio_expected_ref_count(const struct folio *folio)
> +{
> +    const int order = folio_order(folio);
> +    int ref_count = 0;

Why are we not taking base ref_count as 1 like it's done in original folio_expected_refs
implementation?

> +
> +    if (WARN_ON_ONCE(folio_test_slab(folio)))
> +        return 0;
> +
> +    if (folio_test_anon(folio)) {
> +        /* One reference per page from the swapcache. */
> +        ref_count += folio_test_swapcache(folio) << order;

why not use folio_nr_pages() here instead 1 << order?
something like folio_test_swapcache(folio) * folio_nr_pages(folio).

> +    } else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) {
> +        /* One reference per page from the pagecache. */
> +        ref_count += !!folio->mapping << order;
> +        /* One reference from PG_private. */
> +        ref_count += folio_test_private(folio);
> +    }
> +
> +    /* One reference per page table mapping. */
> +    return ref_count + folio_mapcount(folio);;

> +}
> +
>  #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
>  static inline int arch_make_folio_accessible(struct folio *folio)
>  {

I tested your patch with stress-ng and my move-pages test code. I did not see
any bugs/errors.

Thanks,
Shivank