linux-kernel - Re: [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94BF4806-ABCD-4D01-8577-9E138A634815@nvidia.com>
Date: Mon, 26 May 2025 10:20:39 -0400
From: Zi Yan <ziy@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: Bharata B Rao <bharata@....com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, Jonathan.Cameron@...wei.com, dave.hansen@...el.com,
 gourry@...rry.net, hannes@...xchg.org, mgorman@...hsingularity.net,
 mingo@...hat.com, peterz@...radead.org, raghavendra.kt@....com,
 riel@...riel.com, rientjes@...gle.com, sj@...nel.org, weixugc@...gle.com,
 willy@...radead.org, ying.huang@...ux.alibaba.com, dave@...olabs.net,
 nifan.cxl@...il.com, joshua.hahnjy@...il.com, xuezhengchu@...wei.com,
 yiannis@...corp.com, akpm@...ux-foundation.org
Subject: Re: [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages

On 26 May 2025, at 5:29, David Hildenbrand wrote:

> On 22.05.25 19:30, Zi Yan wrote:
>> On 22 May 2025, at 13:21, David Hildenbrand wrote:
>>
>>> On 22.05.25 18:38, Zi Yan wrote:
>>>> On 22 May 2025, at 12:26, David Hildenbrand wrote:
>>>>
>>>>> On 22.05.25 18:24, Zi Yan wrote:
>>>>>> On 22 May 2025, at 12:11, David Hildenbrand wrote:
>>>>>>
>>>>>>> On 21.05.25 10:02, Bharata B Rao wrote:
>>>>>>>> Currently the folios identified as misplaced by the NUMA
>>>>>>>> balancing sub-system are migrated one by one from the NUMA
>>>>>>>> hint fault handler as and when they are identified as
>>>>>>>> misplaced.
>>>>>>>>
>>>>>>>> Instead of such singe folio migrations, batch them and
>>>>>>>> migrate them at once.
>>>>>>>>
>>>>>>>> Identified misplaced folios are isolated and stored in
>>>>>>>> a per-task list. A new task_work is queued from task tick
>>>>>>>> handler to migrate them in batches. Migration is done
>>>>>>>> periodically or if pending number of isolated foios exceeds
>>>>>>>> a threshold.
>>>>>>>
>>>>>>> That means that these pages are effectively unmovable for other purposes (CMA, compaction, long-term pinning, whatever) until that list was drained.
>>>>>>>
>>>>>>> Bad.
>>>>>>
>>>>>> Probably we can mark these pages and when others want to migrate the page,
>>>>>> get_new_page() just looks at the page's target node and get a new page from
>>>>>> the target node.
>>>>>
>>>>> How do you envision that working when CMA needs to migrate this exact page to a different location?
>>>>>
>>>>> It cannot isolate it for migration because ... it's already isolated ... so it will give up.
>>>>>
>>>>> Marking might not be easy I assume ...
>>>>
>>>> I guess you mean we do not have any extra bit to indicate this page is isolated,
>>>> but it can be migrated. My point is that if this page is going to be migrated
>>>> due to other reasons, like CMA, compaction, why not migrate it to the target
>>>> node instead of moving it around within the same node.
>>>
>>> I think we'd have to identify that
>>>
>>> a) This page is isolate for migration (could be isolated for other
>>>     reasons)
>>>
>>> b) The one responsible for the isolation is numa code (could be someone
>>>     else)
>>>
>>> c) We're allowed to grab that page from that list (IOW sync against
>>>     others, and especially also against), to essentially "steal" the
>>>     isolated page.
>>
>> Right. c) sounds like adding more contention to the candidate list.
>> I wonder if we can just mark the page as migration candidate (using
>> a page flag or something else), then migrate it whenever CMA,
>> compaction, long-term pinning and more look at the page.
>
> I mean, all these will migrate the page either way, no need to add another flag for that.
>
> I guess what you mean, indicating that the migration destination should be on a different node than the current one.

Yes.

>
> Well, and for the NUMA scanner (below) to find which pages to migrate.
>
> ... to be this raises some questions: like, if we don't migrate immediately, could that information ("migrate this page") actually now be wrong? I guess a way to

Could be. So it is better to evaluate the page before the actual migration, in
case the page is no longer needed in a remote node.

> obtain the destination node would suffice: if the destination node matches, no need to migrate from that NUMA scanner.

Right. The destination node could be calculated by certain metric like most recent
accesses or last remote node access time. If most recent accesses are still coming
from a remote node and/or last remote node access time is within a short time frame,
the page should be migrated. Since it is possible that the page is frequently accessed
by a remote node but when it comes to migration, it is no longer needed by a remote
node and the access pattern would look like 1) a lot of remote node accesses, but
2) the last remote node access is long time ago.

>
> In addition,
>> periodically, the migration task would do a PFN scanning and migrate
>> any migration candidate. I remember Willy did some experiments showing
>> that PFN scanning is very fast.
>
> PFN scanning can be faster than walking lists, but I suspect it depends on how many pages there really are to be migrated ... and some other factors :)

Yes. LRU list is good since it restricts the scanning range, but PFN scanning
itself does not have it. PFN scanning with some filter mechanism might work
and that filter mechanism is a way of marking to-be-migrated pages. Of course,
a quick re-evaluation of the to-be-migrated pages right before a migration
would avoid unnecessary work like we discussed above.

--
Best Regards,
Yan, Zi