[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31622970-62e2-020a-b802-9b961a7db03d@linux.ibm.com>
Date: Sun, 18 Feb 2024 14:57:17 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Chen Yu <yu.chen.surf@...il.com>, linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [PATCH v2 0/3] Introduce SIS_CACHE to choose previous CPU during
task wakeup
Hi Chen Yu,
On 21/11/23 13:09, Chen Yu wrote:
> v1 -> v2:
> - Move the task sleep duration from sched_entity to task_struct. (Aaron Lu)
> - Refine the task sleep duration calculation based on task's previous running
> CPU. (Aaron Lu)
> - Limit the cache-hot idle CPU scan depth to reduce the time spend on
> searching, to fix the regression. (K Prateek Nayak)
> - Add test results of the real life workload per request from Ingo
> Daytrader on a power system. (Madadi Vineeth Reddy)
> OLTP workload on Xeon Sapphire Rapids.
> - Refined the commit log, added Reviewed-by tag to PATCH 1/3
> (Mathieu Desnoyers).
>
> RFC -> v1:
> - drop RFC
> - Only record the short sleeping time for each task, to better honor the
> burst sleeping tasks. (Mathieu Desnoyers)
> - Keep the forward movement monotonic for runqueue's cache-hot timeout value.
> (Mathieu Desnoyers, Aaron Lu)
> - Introduce a new helper function cache_hot_cpu() that considers
> rq->cache_hot_timeout. (Aaron Lu)
> - Add analysis of why inhibiting task migration could bring better throughput
> for some benchmarks. (Gautham R. Shenoy)
> - Choose the first cache-hot CPU, if all idle CPUs are cache-hot in
> select_idle_cpu(). To avoid possible task stacking on the waker's CPU.
> (K Prateek Nayak)
>
> Thanks for the comments and tests!
>
> ----------------------------------------------------------------------
>
> This series aims to continue the discussion of how to make the wakee
> to choose its previous CPU easier.
>
> When task p is woken up, the scheduler leverages select_idle_sibling()
> to find an idle CPU for it. p's previous CPU is usually a preference
> because it can improve cache locality. However in many cases, the
> previous CPU has already been taken by other wakees, thus p has to
> find another idle CPU.
>
> Inhibit the task migration could benefit many workloads. Inspired by
> Mathieu's proposal to limit the task migration ratio[1], introduce
> the SIS_CACHE. It considers the sleep time of the task for better
> task placement. Based on the task's short sleeping history, tag p's
> previous CPU as cache-hot. Later when p is woken up, it can choose
> its previous CPU in select_idle_sibling(). When other task is
> woken up, skip this cache-hot idle CPU and try the next idle CPU
> when possible. The idea of SIS_CACHE is to optimize the idle CPU
> scan sequence. The extra scan time is minimized by restricting the
> scan depth of cache-hot CPUs to 50% of the scan depth of SIS_UTIL.
>
> This test is based on tip/sched/core, on top of
> Commit ada87d23b734
> ("x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram")
>
> This patch set has shown 15% ~ 70% improvements for client/server
> workloads like netperf and tbench. It shows 0.7% improvement of
> OLTP with 0.2% run-to-run variation on Xeon 240 CPUs system.
> There is 2% improvement of another real life workload Daytrader
> per the test of Madadi on a power system with 96 CPUs. Prateek
> has helped check there is no obvious microbenchmark regression
> of the v2 on a 3rd Generation EPYC System with 128 CPUs.
>
> Link: https://lore.kernel.org/lkml/20230905171105.1005672-2-mathieu.desnoyers@efficios.com/ #1
>
> Chen Yu (3):
> sched/fair: Record the task sleeping time as the cache hot duration
> sched/fair: Calculate the cache-hot time of the idle CPU
> sched/fair: skip the cache hot CPU in select_idle_cpu()
>
> include/linux/sched.h | 4 ++
> kernel/sched/fair.c | 88 +++++++++++++++++++++++++++++++++++++++--
> kernel/sched/features.h | 1 +
> kernel/sched/sched.h | 1 +
> 4 files changed, 91 insertions(+), 3 deletions(-)
>
Any update or progress regarding this patch?
I was working on a patch that improves scheduler performance in power10 by making changes
to the order in which domains are accessed for cpu selection during wakeup. It turns out
that this patch is helpful in that regard and my patch is giving better performance on top
of this patch.
So, looking forward to know the progress/status of this patch.
Thanks and Regards
Madadi Vineeth Reddy
Powered by blists - more mailing lists