linux-kernel - Re: [RFC][PATCH] sched: Cache aware load-balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d7fa00e-587e-4ac8-90d0-115f30fdf0ac@linux.ibm.com>
Date: Tue, 1 Apr 2025 01:47:26 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: "Chen, Yu C" <yu.c.chen@...el.com>
Cc: mingo@...nel.org, gautham.shenoy@....com, kprateek.nayak@....com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
        tim.c.chen@...ux.intel.com, tglx@...utronix.de,
        Peter Zijlstra <peterz@...radead.org>,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [RFC][PATCH] sched: Cache aware load-balancing

Hi Chen Yu,

On 27/03/25 16:44, Chen, Yu C wrote:
> Hi Madadi,
> 
> On 3/27/2025 10:43 AM, Madadi Vineeth Reddy wrote:
>> Hi Peter,
>>
>> On 25/03/25 17:39, Peter Zijlstra wrote:
>>> Hi all,
>>>
>>> One of the many things on the eternal todo list has been finishing the
>>> below hackery.
>>>
>>> It is an attempt at modelling cache affinity -- and while the patch
>>> really only targets LLC, it could very well be extended to also apply to
>>> clusters (L2). Specifically any case of multiple cache domains inside a
>>> node.
>>>
>>> Anyway, I wrote this about a year ago, and I mentioned this at the
>>> recent OSPM conf where Gautham and Prateek expressed interest in playing
>>> with this code.
>>>
>>> So here goes, very rough and largely unproven code ahead :-)
>>>
>>> It applies to current tip/master, but I know it will fail the __percpu
>>> validation that sits in -next, although that shouldn't be terribly hard
>>> to fix up.
>>>
>>> As is, it only computes a CPU inside the LLC that has the highest recent
>>> runtime, this CPU is then used in the wake-up path to steer towards this
>>> LLC and in task_hot() to limit migrations away from it.
>>>
>>> More elaborate things could be done, notably there is an XXX in there
>>> somewhere about finding the best LLC inside a NODE (interaction with
>>> NUMA_BALANCING).
>>
>> Tested the patch on a 12-core, 96-thread Power10 system using a real-life
>> workload, DayTrader.
> 
> Do all the Cores share the same LLC within 1 node? If this is the case,
> the regression might be due to over-migration/task stacking within 1 LLC/node. This patch should be modified that cache aware load balancing/wakeup will not be triggered if there is only 1 LLC within the node IMO.

Are you asking whether LLC is shared at the node level?

In Power10, the LLC is at the small core level, covering 4 threads.

In my test setup, there were 4 nodes, each with 24 CPUs, meaning there
were 6 LLCs per node.

Went through the patch in more detail and will check if task stacking
is an issue using micro-benchmarks.

Thanks for your feedback.

Thanks,
Madadi Vineeth Reddy

> 
> thanks,
> Chenyu
> 
>>
>> Here is a summary of the runs:
>>
>> Users | Instances | Throughput vs Base | Avg Resp. Time vs Base
>> --------------------------------------------------------------
>> 30    | 1        | -25.3%              | +50%
>> 60    | 1        | -25.1%              | +50%
>> 30    | 3        | -22.8%              | +33%
>>
>> As of now, the patch negatively impacts performance both in terms of
>> throughput and latency.
>>
>> I will conduct more extensive testing with both microbenchmarks and
>> real-life workloads.
>>
>> Thanks,
>> Madadi Vineeth Reddy
>>