[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sekrbvyr.fsf@DESKTOP-5N7EMDA>
Date: Mon, 26 May 2025 16:46:36 +0800
From: "Huang, Ying" <ying.huang@...ux.alibaba.com>
To: Bharata B Rao <bharata@....com>
Cc: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<Jonathan.Cameron@...wei.com>, <dave.hansen@...el.com>,
<gourry@...rry.net>, <hannes@...xchg.org>,
<mgorman@...hsingularity.net>, <mingo@...hat.com>,
<peterz@...radead.org>, <raghavendra.kt@....com>, <riel@...riel.com>,
<rientjes@...gle.com>, <sj@...nel.org>, <weixugc@...gle.com>,
<willy@...radead.org>, <ziy@...dia.com>, <dave@...olabs.net>,
<nifan.cxl@...il.com>, <joshua.hahnjy@...il.com>,
<xuezhengchu@...wei.com>, <yiannis@...corp.com>,
<akpm@...ux-foundation.org>, <david@...hat.com>
Subject: Re: [RFC PATCH v0 0/2] Batch migration for NUMA balancing
Hi, Bharata,
Bharata B Rao <bharata@....com> writes:
> Hi,
>
> This is an attempt to convert the NUMA balancing to do batched
> migration instead of migrating one folio at a time. The basic
> idea is to collect (from hint fault handler) the folios to be
> migrated in a list and batch-migrate them from task_work context.
> More details about the specifics are present in patch 2/2.
>
> During LSFMM[1] and subsequent discussions in MM alignment calls[2],
> it was suggested that separate migration threads to handle migration
> or promotion request may be desirable. Existing NUMA balancing, hot
> page promotion and other future promotion techniques could off-load
> migration part to these threads.
What is the expected benefit of the change?
For code reuse, we can use migrate_misplaced_folio() or
migrate_misplaced_folio_batch() in various promotion path.
For workload latency influence, per my understanding, PTE scanning is
much more serious than migration. Why not start from that?
> Or if we manage to have a single
> source of hotness truth like kpromoted[3], then that too can hand
> over migration requests to the migration threads. I am envisaging
> that different hotness sources like kmmscand[4], MGLRU[5], IBS[6]
> and CXL HMU would push hot page info to kpromoted, which would
> then isolate and push the folios to be promoted to the migrator
> thread.
>
> As a first step, this is an attempt to batch and perform NUMAB
> migrations in async manner. Separate migration threads aren't
> yet implemented but I am using Gregory's patch[7] that provides
> migrate_misplaced_folio_batch() API to do batch migration of
> misplaced folios.
>
> Some points for discussion
> --------------------------
> 1. To isolate the misplaced folios or not?
>
> To do batch migration, the misplaced folios need to be stored in
> some manner. I thought isolating them and using the folio->lru
> field to link them up would be the most straight-forward way. But
> then there were concerns expressed about folios remaining isolated
> for long until they get migrated.
>
> Or should we just maintain the PFNs instead of folios and
> isolate them only just prior to migrating them?
>
> 2. Managing target_nid for misplaced pages
>
> NUMAB provides the accurate target_nid for each folio that is
> detected as misplaced. However when we don't migrate the folio
> right away, but instead want to batch and do asyn migration later,
> then where do we keep track of target_nid for each folio?
>
> In this implementation, I am using last_cpupid field as it appeared
> that this field could be reused (with some challenges mentioned
> in 2/2) for isolated folios. This approach may be specific to NUMAB
> but then each sub-system that hands over pages to the migrator thread
> should also provide a target_nid and hence each sub-system should be
> free to maintain and track the target_nid of folios that it has
> isolated/batched for migration in its own specific manner.
>
> 3. How many folios to batch?
>
> Currently I have a fixed threshold for number of folios to batch.
> It could be a sysctl to allow a setting between a min and max. It
> could also be auto-tuned if required.
>
> The state of the patchset
> -------------------------
> * Still raw and very lightly tested
> * Just posted to serve as base for subsequent discussions
> here and in MM alignment calls.
>
> References
> ----------
> [1] LSFMM LWN summary - https://lwn.net/Articles/1016519/
> [2] MM alignment call summary - https://lore.kernel.org/linux-mm/263d7140-c343-e82e-b836-ec85c52b54eb@google.com/
> [3] kpromoted patchset - https://lore.kernel.org/linux-mm/20250306054532.221138-1-bharata@amd.com/
> [4] Kmmscand: PTE A bit scanning - https://lore.kernel.org/linux-mm/20250319193028.29514-1-raghavendra.kt@amd.com/
> [5] MGLRU scanning for page promotion - https://lore.kernel.org/lkml/20250324220301.1273038-1-kinseyho@google.com/
> [6] IBS base hot page promotion - https://lore.kernel.org/linux-mm/20250306054532.221138-4-bharata@amd.com/
> [7] Unmapped page cache folio promotion patchset - https://lore.kernel.org/linux-mm/20250411221111.493193-1-gourry@gourry.net/
>
> Bharata B Rao (1):
> mm: sched: Batch-migrate misplaced pages
>
> Gregory Price (1):
> migrate: implement migrate_misplaced_folio_batch
>
> include/linux/migrate.h | 6 ++++
> include/linux/sched.h | 4 +++
> init/init_task.c | 2 ++
> kernel/sched/fair.c | 64 +++++++++++++++++++++++++++++++++++++++++
> mm/memory.c | 44 ++++++++++++++--------------
> mm/migrate.c | 31 ++++++++++++++++++++
> 6 files changed, 130 insertions(+), 21 deletions(-)
---
Best Regards,
Huang, Ying
Powered by blists - more mailing lists