[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1b24dd5-dce9-61ed-baba-a70f08276bf5@arm.com>
Date: Mon, 20 Jul 2020 10:47:42 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: chris hyser <chris.hyser@...cle.com>,
Parth Shah <parth@...ux.ibm.com>,
Patrick Bellasi <patrick.bellasi@...bug.net>,
LKML <linux-kernel@...r.kernel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Jonathan Corbet <corbet@....net>,
Dhaval Giani <dhaval.giani@...cle.com>,
Josef Bacik <jbacik@...com>
Subject: Re: [SchedulerWakeupLatency] Skipping Idle Cores and CPU Search
On 10/07/2020 01:08, chris hyser wrote:
[...]
>> D) Desired behavior:
>
> Reduce the maximum wake-up latency of designated CFS tasks by skipping
> some or all of the idle CPU and core searches by setting a maximum idle
> CPU search value (maximum loop iterations).
>
> Searching 'ALL' as the maximum would be the default and implies the
> current code path which may or may not search up to ALL. Searching 0
> would result in the least latency (shown with experimental results to be
> included if/when patchset goes up). One of the considerations is that
> the maximum length of the search is a function of the size of the LLC
> scheduling domain and this is platform dependent. Whether 'some', i.e. a
> numerical value limiting the search can be used to "normalize" this
> latency across differing scheduling domain sizes is under investigation.
> Clearly differing hardware will have many other significant differences,
> but in different sized and dynamically sized VMs running on fleets of
> common HW this may be interesting.
I assume that this task-specific feature could coexists in
select_idle_core() and select_idle_cpu() with the already existing
runtime heuristics (test_idle_cores() and the two sched features
mentioned under E/F) to reduce the idle CPU search space on a busy system.
>> E/F) Existing knobs (and limitations):
>
> There are existing sched_feat: SIS_AVG_CPU, SIS_PROP that attempt to
> short circuit the idle cpu search path in select_idle_cpu() based on
> estimations of the current costs of searching. Neither provides a means
[...]
>> H) Range Analysis:
>
> The knob is a positive integer representing "max number of CPUs to
> search". The default would be 'ALL' which could be translated as
> INT_MAX. '0 searches' translates to 0. Other values represent a max
> limit on the search, in this case iterations of a for loop.
IMHO the opposite use case for this feature (favour high throughput over
short wakeup latency (Facebook) is already cured by the changes
introduced by commit 10e2f1acd010 ("sched/core: Rewrite and improve
select_idle_siblings()"), i.e. with the current implementation of sis().
It seems that they don't need an additional per-task feature on top of
the default system-wide runtime heuristics.
[...]
Powered by blists - more mailing lists