linux-kernel - Re: [RFC PATCH 0/5] sched/fair: Idle and newidle balancing using push mechanism.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65eddd6b-c635-4ae6-b9a7-fc62cf44caa3@amd.com>
Date: Wed, 9 Apr 2025 16:47:42 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, <linux-kernel@...r.kernel.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
	<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Gautham R.
 Shenoy" <gautham.shenoy@....com>, Swapnil Sapkal <swapnil.sapkal@....com>
Subject: Re: [RFC PATCH 0/5] sched/fair: Idle and newidle balancing using push
 mechanism.

Missed the most important part ...

Subject: sched/fair: Idle and newidle balancing using push mechanism.

-- 
Thanks and Regards,
Prateek

On 4/9/2025 4:45 PM, K Prateek Nayak wrote:
> Hello everyone,
> 
> There was some interest at OSPM'25 to explore using the push task
> mechanism for idle and newidle balance. This series implements one such
> idea. The main reason for the RFC is to understand if this is the
> implementation people were in favor of before trying to optimize it for
> all the workloads from my test setup.
> 
> Note: The current performance of the prototype is rough. I haven't
> optimized it yet since I would love some feedback first on the approach.
> 
> 
> Current approach
> ================
> 
> The push task framework for fair class has been cherry-pick from
> Vincent's series and has been implemented for !EAS case.
> 
> This series implements the idea from Valentin [2] where, in presence of
> pushable tasks, the CPU will set itself on a per-LLC "overloaded_mask".
> 
> The inter-NUMA newidle balance has been modified to traverse the CPUs
> set on the overloaded mask, first in the local-LLC, and then CPUs set on
> overloaded mask of other LLCs in same NUMA node with the goal of pulling
> a single task towards itself rather than performing a full fledged load
> balancing.
> 
> This implements some of the ideas from David Vernet's SAHRED_RUNQ
> prototype [3] except, instead of a single SHARED_RUNQ per-LLC /
> per-shard, the overloaded mask serves an indicator of per-CPU rq(s)
> containing pushable task that can be migrated to the CPU going idle.
> This avoids having a per-SHARED_RUNQ lock at the expense of maintaining
> the overloaded cpumask.
> 
> The push callback itself has been modified to try push the tasks on the
> pushable task list to one of the CPUs on the "nohz.idle_cpus_mask"
> taking the load off of idle balancing.
> 
> 
> Clarification required
> ======================
> 
> I believe using the per-CPU pushable task list as a proxy for a single
> SHARED_RUNQ was the idea Peter was implying during the discussion. Is
> this correct or did I completely misunderstand it? P.S. SHARED_RUNQ
> could also be modelled as a large per-LLC push list.
> 
> An alternate implementation is to allow CPUs to go to idle as quickly as
> possible and then rely completely on push mechanism and the
> "idle_cpu_mask" to push task to an idle CPU however this puts the burden
> of moving tasks on a busy overloaded CPU which may not be ideal.
> 
> Since folks mentioned using "push mechanism" for newidle balance, was
> the above idea the one they had in mind?
> 
> There seems to be some clear advantage from doing a complete balance in
> the newidle path. Since the schedstats are not rigged up yet for the new
> approach, I'm not completely sure where the advantages vs disadvantages
> are currently.
> 
> If the current approach is right, I'll dig deeper to try address all the
> shortcomings of this prototype.
> 
> Systems with unified LLC will likely run into bottlenecks to maintain a
> large per-LLC mask that can have multiple concurrent updates. I have
> plans to implement a "sd_shard" which shards the large LLC making the
> cpumask maintenance less heavy on these systems.
> 
> 
> References
> ==========
> 
> [1] https://lore.kernel.org/lkml/20250302210539.1563190-6-vincent.guittot@linaro.org/
> [2] https://lore.kernel.org/lkml/xhsmh1putoxbz.mognet@vschneid-thinkpadt14sgen2i.remote.csb/
> [3] https://lore.kernel.org/lkml/20231212003141.216236-1-void@manifault.com/
> 
> --
> K Prateek Nayak (4):
>    sched/fair: Introduce overloaded_mask in sched_domain_shared
>    sched/fair: Update overloaded mask in presence of pushable task
>    sched/fair: Rework inter-NUMA newidle balancing
>    sched/fair: Proactive idle balance using push mechanism
> 
> Vincent Guittot (1):
>    sched/fair: Add push task framework
> 
>   include/linux/sched/topology.h |   1 +
>   kernel/sched/fair.c            | 297 +++++++++++++++++++++++++++++++--
>   kernel/sched/sched.h           |   2 +
>   kernel/sched/topology.c        |  25 ++-
>   4 files changed, 306 insertions(+), 19 deletions(-)
> 
> 
> base-commit: 6432e163ba1b7d80b5876792ce53e511f041ab91