linux-kernel - Re: [RFC][PATCH] sched: Cache aware load-balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <457a6070-b34e-4467-8251-f69c4015fccb@intel.com>
Date: Thu, 27 Mar 2025 19:14:02 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: <vineethr@...ux.ibm.com>
CC: <mingo@...nel.org>, <gautham.shenoy@....com>, <kprateek.nayak@....com>,
	<juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
	<tim.c.chen@...ux.intel.com>, <tglx@...utronix.de>, Madadi Vineeth Reddy
	<vineethr@...ux.ibm.com>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC][PATCH] sched: Cache aware load-balancing

Hi Madadi,

On 3/27/2025 10:43 AM, Madadi Vineeth Reddy wrote:
> Hi Peter,
> 
> On 25/03/25 17:39, Peter Zijlstra wrote:
>> Hi all,
>>
>> One of the many things on the eternal todo list has been finishing the
>> below hackery.
>>
>> It is an attempt at modelling cache affinity -- and while the patch
>> really only targets LLC, it could very well be extended to also apply to
>> clusters (L2). Specifically any case of multiple cache domains inside a
>> node.
>>
>> Anyway, I wrote this about a year ago, and I mentioned this at the
>> recent OSPM conf where Gautham and Prateek expressed interest in playing
>> with this code.
>>
>> So here goes, very rough and largely unproven code ahead :-)
>>
>> It applies to current tip/master, but I know it will fail the __percpu
>> validation that sits in -next, although that shouldn't be terribly hard
>> to fix up.
>>
>> As is, it only computes a CPU inside the LLC that has the highest recent
>> runtime, this CPU is then used in the wake-up path to steer towards this
>> LLC and in task_hot() to limit migrations away from it.
>>
>> More elaborate things could be done, notably there is an XXX in there
>> somewhere about finding the best LLC inside a NODE (interaction with
>> NUMA_BALANCING).
> 
> Tested the patch on a 12-core, 96-thread Power10 system using a real-life
> workload, DayTrader.

Do all the Cores share the same LLC within 1 node? If this is the case,
the regression might be due to over-migration/task stacking within 1 
LLC/node. This patch should be modified that cache aware load 
balancing/wakeup will not be triggered if there is only 1 LLC within the 
node IMO.

thanks,
Chenyu

> 
> Here is a summary of the runs:
> 
> Users | Instances | Throughput vs Base | Avg Resp. Time vs Base
> --------------------------------------------------------------
> 30    | 1        | -25.3%              | +50%
> 60    | 1        | -25.1%              | +50%
> 30    | 3        | -22.8%              | +33%
> 
> As of now, the patch negatively impacts performance both in terms of
> throughput and latency.
> 
> I will conduct more extensive testing with both microbenchmarks and
> real-life workloads.
> 
> Thanks,
> Madadi Vineeth Reddy
>