[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <qcasnjdjet57uxhwavfiaxepq7anf2zvmi4rzkp5lxysovqwme@wwcyh4nvlxiv>
Date: Tue, 27 May 2025 10:48:23 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: "Chen, Yu C" <yu.c.chen@...el.com>
Cc: peterz@...radead.org, akpm@...ux-foundation.org, mkoutny@...e.com,
mingo@...hat.com, tj@...nel.org, hannes@...xchg.org, corbet@....net,
mgorman@...e.de, mhocko@...nel.org, muchun.song@...ux.dev,
roman.gushchin@...ux.dev, tim.c.chen@...el.com, aubrey.li@...el.com, libo.chen@...cle.com,
kprateek.nayak@....com, vineethr@...ux.ibm.com, venkat88@...ux.ibm.com, ayushjai@....com,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, yu.chen.surf@...mail.com
Subject: Re: [PATCH v5 2/2] sched/numa: add statistics of numa balance task
On Sun, May 25, 2025 at 08:35:24PM +0800, Chen, Yu C wrote:
> On 5/25/2025 1:32 AM, Shakeel Butt wrote:
[...]
> > can you please give an end-to-end> flow/story of all these events
> happening on a timeline.
> >
>
> Yes, sure, let me have a try.
>
> The goal of NUMA balancing is to co-locate a task and its
> memory pages on the same NUMA node. There are two strategies:
> migrate the pages to the task's node, or migrate the task to
> the node where its pages reside.
>
> Suppose a task p1 is running on Node 0, but its pages are
> located on Node 1. NUMA page fault statistics for p1 reveal
> its "page footprint" across nodes. If NUMA balancing detects
> that most of p1's pages are on Node 1:
>
> 1.Page Migration Attempt:
> The Numa balance first tries to migrate p1's pages to Node 0.
> The numa_page_migrate counter increments.
>
> 2.Task Migration Strategies:
> After the page migration finishes, Numa balance checks every
> 1 second to see if p1 can be migrated to Node 1.
>
> Case 2.1: Idle CPU Available
> If Node 1 has an idle CPU, p1 is directly scheduled there. This event is
> logged as numa_task_migrated.
> Case 2.2: No Idle CPU (Task Swap)
> If all CPUs on Node1 are busy, direct migration could cause CPU contention
> or load imbalance. Instead:
> The Numa balance selects a candidate task p2 on Node 1 that prefers
> Node 0 (e.g., due to its own page footprint).
> p1 and p2 are swapped. This cross-node swap is recorded as
> numa_task_swapped.
>
Thanks for the explanation, this is really helpful and I would like this
to be included in the commit message.
> > Beside that, do you think there might be some other scheduling events
> > (maybe unrelated to numa balancing) which might be suitable for
> > memory.stat? Basically I am trying to find if having sched events in
> > memory.stat be an exception for numa balancing or more general.
>
> If the criterion is a combination of task scheduling strategy and
> page-based operations, I cannot find any other existing scheduling
> events. For now, NUMA balancing seems to be the only case.
Mainly I was looking if in future we need to add more sched events to
memory.stat file.
Let me reply on the other email chain on what should we do next.
Powered by blists - more mailing lists