[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f6c7c69-b6b3-4c82-8db3-96757f09245f@linux.ibm.com>
Date: Thu, 10 Jul 2025 01:09:14 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Tim Chen <tim.c.chen@...el.com>,
Vincent Guittot
<vincent.guittot@...aro.org>,
Libo Chen <libo.chen@...cle.com>, Abel Wu <wuyun.abel@...edance.com>,
Hillf Danton <hdanton@...a.com>, Len Brown <len.brown@...el.com>,
linux-kernel@...r.kernel.org, Chen Yu <yu.c.chen@...el.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [RFC patch v3 00/20] Cache aware scheduling
On 18/06/25 23:57, Tim Chen wrote:
> This is the third revision of the cache aware scheduling patches,
> based on the original patch proposed by Peter[1].
>
> The goal of the patch series is to aggregate tasks sharing data
> to the same cache domain, thereby reducing cache bouncing and
> cache misses, and improve data access efficiency. In the current
> implementation, threads within the same process are considered
> as entities that potentially share resources.
[..snip..]
>
> Comments and tests are much appreciated.
When running ebizzy as below:
ebizzy -t 8 -S 10
I see ~24% degradation on the patched kernel, due to higher SMT2 and
SMT4 cycles compared to the baseline. ST cycles decreased.
Since both P10 and P11 have LLC shared at the SMT4 level, even spawning
fewer threads easily crowds the LLC with the default llc_aggr_cap value
of 50. Increasing this value would likely make things worse, while
decreasing it to 25 effectively disables cache-aware scheduling
(as it limits selection to just one CPU).
I understand that ebizzy itself doesn't benefit from cache sharing, so
it might not improve but here it actually *regresses*, and the impact
may be even larger on P10 /P11 because of its smaller LLC shared by 4
CPUs, even with fewer threads. IPC drops.
By default, the SCHED_CACHE feature is enabled. Given these results for
workloads that don't share cache and on systems with smaller LLCs, I think
the default value should be revisited.
Thanks,
Madadi Vineeth Reddy
>
> [1] https://lore.kernel.org/all/20250325120952.GJ36322@noisy.programming.kicks-ass.net/
>
> The patches are grouped as follow:
> Patch 1: Peter's original patch.
> Patch 2-5: Various fixes and tuning of the original v1 patch.
> Patch 6-12: Infrastructure and helper functions for load balancing to be cache aware.
> Patch 13-18: Add logic to load balancing for preferred LLC aggregation.
> Patch 19: Add process LLC aggregation in load balancing sched feature.
> Patch 20: Add Process LLC aggregation in wake up sched feature (turn off by default).
>
> v1:
> https://lore.kernel.org/lkml/20250325120952.GJ36322@noisy.programming.kicks-ass.net/
> v2:
> https://lore.kernel.org/lkml/cover.1745199017.git.yu.c.chen@intel.com/
>
>
> Chen Yu (3):
> sched: Several fixes for cache aware scheduling
> sched: Avoid task migration within its preferred LLC
> sched: Save the per LLC utilization for better cache aware scheduling
>
> K Prateek Nayak (1):
> sched: Avoid calculating the cpumask if the system is overloaded
>
> Peter Zijlstra (1):
> sched: Cache aware load-balancing
>
> Tim Chen (15):
> sched: Add hysteresis to switch a task's preferred LLC
> sched: Add helper function to decide whether to allow cache aware
> scheduling
> sched: Set up LLC indexing
> sched: Introduce task preferred LLC field
> sched: Calculate the number of tasks that have LLC preference on a
> runqueue
> sched: Introduce per runqueue task LLC preference counter
> sched: Calculate the total number of preferred LLC tasks during load
> balance
> sched: Tag the sched group as llc_balance if it has tasks prefer other
> LLC
> sched: Introduce update_llc_busiest() to deal with groups having
> preferred LLC tasks
> sched: Introduce a new migration_type to track the preferred LLC load
> balance
> sched: Consider LLC locality for active balance
> sched: Consider LLC preference when picking tasks from busiest queue
> sched: Do not migrate task if it is moving out of its preferred LLC
> sched: Introduce SCHED_CACHE_LB to control cache aware load balance
> sched: Introduce SCHED_CACHE_WAKE to control LLC aggregation on wake
> up
>
> include/linux/mm_types.h | 44 ++
> include/linux/sched.h | 8 +
> include/linux/sched/topology.h | 3 +
> init/Kconfig | 4 +
> init/init_task.c | 3 +
> kernel/fork.c | 5 +
> kernel/sched/core.c | 25 +-
> kernel/sched/debug.c | 4 +
> kernel/sched/fair.c | 859 ++++++++++++++++++++++++++++++++-
> kernel/sched/features.h | 3 +
> kernel/sched/sched.h | 23 +
> kernel/sched/topology.c | 29 ++
> 12 files changed, 982 insertions(+), 28 deletions(-)
>
Powered by blists - more mailing lists