lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52a1b56b-9598-499d-ac9c-de99479d5166@intel.com>
Date: Sun, 25 May 2025 20:35:24 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
CC: <peterz@...radead.org>, <akpm@...ux-foundation.org>, <mkoutny@...e.com>,
	<mingo@...hat.com>, <tj@...nel.org>, <hannes@...xchg.org>, <corbet@....net>,
	<mgorman@...e.de>, <mhocko@...nel.org>, <muchun.song@...ux.dev>,
	<roman.gushchin@...ux.dev>, <tim.c.chen@...el.com>, <aubrey.li@...el.com>,
	<libo.chen@...cle.com>, <kprateek.nayak@....com>, <vineethr@...ux.ibm.com>,
	<venkat88@...ux.ibm.com>, <ayushjai@....com>, <cgroups@...r.kernel.org>,
	<linux-doc@...r.kernel.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <yu.chen.surf@...mail.com>
Subject: Re: [PATCH v5 2/2] sched/numa: add statistics of numa balance task

On 5/25/2025 1:32 AM, Shakeel Butt wrote:
> On Sat, May 24, 2025 at 2:07 AM Chen, Yu C <yu.c.chen@...el.com> wrote:
>>
>> Hi Shakeel,
>>
>> On 5/24/2025 7:42 AM, Shakeel Butt wrote:
>>> On Fri, May 23, 2025 at 08:51:15PM +0800, Chen Yu wrote:
>>>> On systems with NUMA balancing enabled, it has been found
>>>> that tracking task activities resulting from NUMA balancing
>>>> is beneficial. NUMA balancing employs two mechanisms for task
>>>> migration: one is to migrate a task to an idle CPU within its
>>>> preferred node, and the other is to swap tasks located on
>>>> different nodes when they are on each other's preferred nodes.
>>>>
>>>> The kernel already provides NUMA page migration statistics in
>>>> /sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However,
>>>> it lacks statistics regarding task migration and swapping.
>>>> Therefore, relevant counts for task migration and swapping should
>>>> be added.
>>>>
>>>> The following two new fields:
>>>>
>>>> numa_task_migrated
>>>> numa_task_swapped
>>>>
>>>> will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
>>>> and /proc/vmstat
>>>
>>> Hmm these are scheduler events, how are these relevant to memory cgroup
>>> or vmstat?
>>> Any reason to not expose these in cpu.stat?
>>>
>>
>> I understand that in theory they are scheduling activities.
>> The reason for including these statistics here was mainly that
>> I assumed there is a close relationship between page migration
>> and task migration in Numa Balance. Specifically, task migration
>> is triggered when page migration fails.
>> Placing these statistics closer to the existing Numa Balance page
>> statistics in /sys/fs/cgroup/{GROUP}/memory.stat and /proc/vmstat
>> may help users query relevant data from a single file, avoiding
>> the need to search through scattered files.
>> Notably, these events are associated with a task’s working set
>> (footprint) rather than pure CPU cycles IMO. I took a look at
>> the cpu_cfs_stat_show() for cpu.stat, it seems that a lot of
>> code is needed if we want to expose them in cpu.stat, while
>> reusing existing interface of count_memcg_event_mm() is simpler.
> 
> Let me address two of your points first:
> 
> (1) cpu.stat currently contains cpu cycles stats. I don't see an issue
> adding these new events in it as you can see memory.stat exposes stats
> and events as well.
> 
> (2) You can still use count_memcg_event_mm() and related infra while
> exposing the stats/events in cpu.stat.
> 

Got it.

> Now your point on having related stats within a single interface is
> more convincing. Let me ask you couple of simple questions:
> 
> I am not well versed with numa migration, can you expand a bit more on
> these two events (numa_task_migrated & numa_task_swapped)? How are
> these related to numa memory migration? You mentioned these events> happen on page migration failure,

I double-checked the code, and it seems that task numa migration
occurs regardless of whether page migration fails or succeeds.

 > can you please give an end-to-end> flow/story of all these events 
happening on a timeline.
> 

Yes, sure, let me have a try.

The goal of NUMA balancing is to co-locate a task and its
memory pages on the same NUMA node. There are two strategies:
migrate the pages to the task's node, or migrate the task to
the node where its pages reside.

Suppose a task p1 is running on Node 0, but its pages are
located on Node 1. NUMA page fault statistics for p1 reveal
its "page footprint" across nodes. If NUMA balancing detects
that most of p1's pages are on Node 1:

1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.

2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.

Case 2.1: Idle CPU Available
If Node 1 has an idle CPU, p1 is directly scheduled there. This event is 
logged as numa_task_migrated.
Case 2.2: No Idle CPU (Task Swap)
If all CPUs on Node1 are busy, direct migration could cause CPU 
contention or load imbalance. Instead:
The Numa balance selects a candidate task p2 on Node 1 that prefers
Node 0 (e.g., due to its own page footprint).
p1 and p2 are swapped. This cross-node swap is recorded as 
numa_task_swapped.

> Beside that, do you think there might be some other scheduling events
> (maybe unrelated to numa balancing) which might be suitable for
> memory.stat? Basically I am trying to find if having sched events in
> memory.stat be an exception for numa balancing or more general.

If the criterion is a combination of task scheduling strategy and
page-based operations, I cannot find any other existing scheduling
events. For now, NUMA balancing seems to be the only case.


thanks,
Chenyu

> 
> thanks,
> Shakeel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ