linux-kernel - Re: [PATCH V4 1/2] mm: add folio_migration_expected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7ff5b149-534d-4bb7-8c6d-3147279d3fae@amd.com>
Date: Thu, 24 Apr 2025 17:27:47 +0530
From: Shivank Garg <shivankg@....com>
To: Matthew Wilcox <willy@...radead.org>, David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, shaggy@...nel.org,
 wangkefeng.wang@...wei.com, jane.chu@...cle.com, ziy@...dia.com,
 donettom@...ux.ibm.com, apopple@...dia.com,
 jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, syzbot+8bb6fd945af4e0ad9299@...kaller.appspotmail.com
Subject: Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline
 function

Hi All,

Thank you for reviewing my patch and providing feedback.

On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>> On 23.04.25 09:22, David Hildenbrand wrote:
>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>> +/**
>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>
>>>>> "folio_migration_expected_refs"

Thank you for catching this, I'll fix it.

I wasn't previously aware of using make W=1 to build kernel-docs and
check for warnings - this is very useful information for me.

I'll add to changelog to better explain why this is needed for JFS.

>>>>
>>>> What I do wonder is whether we want to have such a specialised
>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>> unmapped first.
>>>
>>> I was debating with myself whether we should do the usual "refs from
>>> ->private, refs from page table mappings" .. dance, and look up the
>>> mapping from the folio instead of passing it in.
>>>
>>> I concluded that for this (migration) purpose the function is good
>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>> still page table mappings), it would not fake that there are no
>>> unexpected references.
>>
>> Sorry, I forgot that we still care about the reference from ->private here.
>> We expect the folio to be unmapped.
> 
> Right, so just adding in folio_mapcount() will be a no-op for migration,
> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
> explain page refocunts to people (and I need to put this in a document
> somewhere):
> 
> There are three types of contribution to the refcount:
> 
>  - Expected.  These are deducible from the folio itself, and they're all
>    findable.  You need to figure out what the expected number of
>    references are to a folio if you're going to try to freeze it.
>    These can be references from the mapcount, the page cache, the swap
>    cache, the private data, your call chain.
>  - Temporary.  Someone else has found the folio somehow; perhaps through
>    the page cache, or by calling GUP or something.  They mean you can't
>    freeze the folio because you don't know who has the reference or how
>    long they might hold it for.
>  - Spurious.  This is like a temporary reference, but worse because if
>    you read the code, there should be no way for there to be any temporary
>    references to the folio.  Someone's found a stale pointer to this
>    folio and has bumped the reference count while they check that the
>    folio they have is the one they expected to find.  They're going
>    to find out that the pointer they followed is stale and put their
>    refcount soon, but in the meantime you still can't freeze the folio.
> 
> So I don't love the idea of having a function with the word "expected"
> in the name that returns a value which doesn't take into account all
> the potential contributors to the expected value.  And sure we can keep
> adding qualifiers to the function name to indicate how it is to be used,
> but at some point I think we should say "It's OK for this to be a little
> less efficient so we can understand what it means".

Thank you, Willy, for the detailed explanation about page reference counting.
This has helped me understand the concept much better.

Based on your explanation and the discussion, I'm summarizing the 2 approaches:

1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
to clarify it does not account for other potential contributors.
or folio_unmapped_base_refs?
2. Accounting all possible contributors to expected refs:
folio_expected_refs(mapping, folio)
{	
	int refs = 1;

	if (mapping) {
		if (folio_test_anon(folio))
			refs += folio_test_swapcache(folio) ?
				folio_nr_pages(folio) : 0;
		else
			refs += folio_nr_pages(folio);

		if (folio_test_private(folio))
			refs++;
	}
	refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
	return refs;
}

Please let me know if this approach is acceptable or if you have
other suggestions for improvement.

Best Regards,
Shivank