linux-kernel - Re: [PATCH 01/19] sched/fair: Add infrastructure for cache-aware load balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251015115445.GR3289052@noisy.programming.kicks-ass.net>
Date: Wed, 15 Oct 2025 13:54:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Hillf Danton <hdanton@...a.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	Jianyong Wu <jianyong.wu@...look.com>,
	Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>,
	Vern Hao <vernhao@...cent.com>, Len Brown <len.brown@...el.com>,
	Aubrey Li <aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>,
	Chen Yu <yu.chen.surf@...il.com>, Chen Yu <yu.c.chen@...el.com>,
	Libo Chen <libo.chen@...cle.com>,
	Adam Li <adamli@...amperecomputing.com>,
	Tim Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 01/19] sched/fair: Add infrastructure for cache-aware
 load balancing

On Wed, Oct 15, 2025 at 12:42:48AM +0530, Madadi Vineeth Reddy wrote:
> > +static void get_scan_cpumasks(cpumask_var_t cpus, int cache_cpu,
> > +			      int pref_nid, int curr_cpu)
> > +{
> > +#ifdef CONFIG_NUMA_BALANCING
> > +	/* First honor the task's preferred node. */
> > +	if (pref_nid != NUMA_NO_NODE)
> > +		cpumask_or(cpus, cpus, cpumask_of_node(pref_nid));
> > +#endif
> > +
> > +	/* Next honor the task's cache CPU if it is not included. */
> > +	if (cache_cpu != -1 && !cpumask_test_cpu(cache_cpu, cpus))
> > +		cpumask_or(cpus, cpus,
> > +			   cpumask_of_node(cpu_to_node(cache_cpu)));
> > +
> > +	/*
> > +	 * Lastly make sure that the task's current running node is
> > +	 * considered.
> > +	 */
> > +	if (!cpumask_test_cpu(curr_cpu, cpus))
> > +		cpumask_or(cpus, cpus, cpumask_of_node(cpu_to_node(curr_cpu)));
> > +}
> > +
> > +static void __no_profile task_cache_work(struct callback_head *work)
> > +{
> > +	struct task_struct *p = current;
> > +	struct mm_struct *mm = p->mm;
> > +	unsigned long m_a_occ = 0;
> > +	unsigned long curr_m_a_occ = 0;
> > +	int cpu, m_a_cpu = -1, cache_cpu,
> > +	    pref_nid = NUMA_NO_NODE, curr_cpu;
> > +	cpumask_var_t cpus;
> > +
> > +	WARN_ON_ONCE(work != &p->cache_work);
> > +
> > +	work->next = work;
> > +
> > +	if (p->flags & PF_EXITING)
> > +		return;
> > +
> > +	if (!zalloc_cpumask_var(&cpus, GFP_KERNEL))
> > +		return;
> > +
> > +	curr_cpu = task_cpu(p);
> > +	cache_cpu = mm->mm_sched_cpu;
> > +#ifdef CONFIG_NUMA_BALANCING
> > +	if (static_branch_likely(&sched_numa_balancing))
> > +		pref_nid = p->numa_preferred_nid;
> > +#endif
> > +
> > +	scoped_guard (cpus_read_lock) {
> > +		get_scan_cpumasks(cpus, cache_cpu,
> > +				  pref_nid, curr_cpu);
> > +
> 
> IIUC, `get_scan_cpumasks` ORs together the preferred NUMA node, cache CPU's node,
> and current CPU's node. This could result in scanning multiple nodes, not preferring
> the NUMA preferred node.

So this used to be online_mask, and is now magically changed to this
more limited mask.

Could you split this change out and have it have a justification?