linux-kernel - Re: [RFC patch v3 01/20] sched: Cache aware load-balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <SI2PR04MB49319E2695AFFD2DE74B2740E345A@SI2PR04MB4931.apcprd04.prod.outlook.com>
Date: Fri, 27 Jun 2025 10:13:02 +0800
From: Jianyong Wu <jianyong.wu@...look.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, "Chen, Yu C" <yu.c.chen@...el.com>
Cc: Juri Lelli <juri.lelli@...hat.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Tim Chen <tim.c.chen@...el.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Libo Chen <libo.chen@...cle.com>,
 Abel Wu <wuyun.abel@...edance.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 Hillf Danton <hdanton@...a.com>, Len Brown <len.brown@...el.com>,
 linux-kernel@...r.kernel.org, "Gautham R . Shenoy" <gautham.shenoy@....com>,
 Ingo Molnar <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
 Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC patch v3 01/20] sched: Cache aware load-balancing

Hi Tim, Chen,

On 6/27/2025 8:10 AM, Tim Chen wrote:
> On Thu, 2025-06-26 at 21:32 +0800, Chen, Yu C wrote:
>>
>>>
>>> This task work may take a long time for the system with large number
>>> cpus which increacing the delay for process back to userspace. It may be
>>> the reason that schbench benchmark regressed so much.
>>>
>>
>> Thanks for the insight Jianyong, yes, the scan on all online CPUs would
>> be costly.
>>
>>> To avoid searching the whole system, what about just searching the
>>> preferred numa node provided by numa balancing if there is one. If not,
>>> then fallback to search the whole system or just search the numa node
>>> where the main process locates as there is a high probability it
>>> contains the preferred LLC. In other words, we can opt for a suboptimal
>>> LLC location to prioritize speed.
>>>
>>> WDYT?
>>>
>> This is a good idea. Previously, Tim had a version that dealt with a
>> similar scenario, which only scanned the CPUs within p's preferred node.
> 
> Yes, we were also thinking along the line of looking only at the preferred
> node.
> 
>>    However, it seems to cause bouncing of the mm->mm_sched_cpu because we
>> set a 2X threshold for switching the mm->mm_sched_cpu in patch 5. If the
>> old mm_sched_cpu is not in p's current preferred node, last_m_a_occ is
>> always 0, which makes the switching of mm->mm_sched_cpu always succeed
>> due to the condition if (m_a_occ > (2 * last_m_a_occ)).
>>
> There were some regressions on schbench during out tests and preferred
> LLC bounces switches a lot with preferred node as mentioned by
> Chen Yu.  For schbench, there's really not much NUMA data and preferred
> node bounces around. We'll have to figure out the right thing
> to do if preferred node changes and preferred LLC falls outside the
> preferred node.
> 
> Tim
> 
>> Anyway, since it
>> is a software issue, we can find a way to address it.
>>
>> Maybe we also following Abel's suggestion that only one thread of
>> the process is allowed to perform the statistic calculation, this
>> could minimal the negative impact to the whole process.
>>
> 
> 
Thanks for explanation. Get it.

Thanks
Jianyong