linux-kernel - Re: Very high scheduling delay with plenty of idle CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cc8831c7-8ea2-0ee7-061f-73352d7832ad@amd.com>
Date: Mon, 11 Nov 2024 10:47:21 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Saravana Kannan <saravanak@...gle.com>, Peter Zijlstra
	<peterz@...radead.org>
CC: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Benjamin
 Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
	<vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
	<wuyun.abel@...edance.com>, <youssefesmat@...omium.org>, Thomas Gleixner
	<tglx@...utronix.de>, <efault@....de>, John Stultz <jstultz@...gle.com>,
	Vincent Palomares <paillon@...gle.com>, Tobias Huschle
	<huschle@...ux.ibm.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs

(+ Tobias)

Hello Saravana,

On 11/10/2024 11:19 AM, Saravana Kannan wrote:
> On Fri, Nov 8, 2024 at 12:31 AM Peter Zijlstra <peterz@...radead.org> wrote:
>>
>> On Thu, Nov 07, 2024 at 11:28:07PM -0800, Saravana Kannan wrote:
>>> Hi scheduler folks,
>>>
>>> I'm running into some weird scheduling issues when testing non-sched
>>> changes on a Pixel 6 that's running close to 6.12-rc5. I'm not sure if
>>> this is an issue in earlier kernel versions or not.
>>>
>>
>> It's a bit unfortunate you don't have a known good kernel there. Anyway,
>> one thing that recently came up is that DELAY_DEQUEUE can cause some
>> delays, specifically it can inhibit wakeup migration.
> 
> I disabled DELAY_DEQUEUE and I'm still seeing preemptions or
> scheduling latency (after wakeup)

On the scheduling latency front, have you tried running with
RUN_TO_PARITY and/or PLACE_LAG disabled. If the tick granularity on your
system is less that the "base_slice_ns", disabling RUN_TO_PARITY can
help switch to a newly woken up task slightly faster. Disabling
PLACE_LAG makes sure the newly woken task is always eligible for
selection. However, both come with the added disadvantage of a sharp
increase in the number of involuntary context switches for some of the
scenarios we have tested. There is a separate thread from Cristian
making a case to toggle these features via sysfs and keep them disabled
by default [0]

[0] https://lore.kernel.org/lkml/20241017052000.99200-1-cpru@amazon.com/

> when there are plenty of CPUs even
> within the same cluster/frequency domain.

I'm not aware of any recent EAS specific changes that could have led to
larger scheduling latencies in the recent times but Tobias had reported
a similar increase in kworker scheduling latency when EEVDF was first
introduced in a different context [1]. I'm not sure if he is still
observing the same behavior on the current upstream but would it be
possible to check if you can see the large scheduling latency only
starting with v6.6 (when EEVDF was introduced) and not on v6.5
(ran with older CFS logic). I'm also assuming the system / benchmark
does change the default scheduler related debug tunables, some of which
went away in v6.6

[1] https://lore.kernel.org/lkml/c7b38bc27cc2c480f0c5383366416455@linux.ibm.com/

> 
> Can we tell the scheduler to just spread out all the tasks during
> suspend/resume? Doesn't make a lot of sense to try and save power
> during a suspend/resume. It's almost always cheaper/better to do those
> quickly.

That would increase the resume latency right since each runnable task
needs to go through a full idle CPU selection cycle? Isn't time a
consideration / concern in the resume path? Unless we go through the
slow path, it is very likely we'll end up making the same task
placement decisions again?

> 
> -Saravana
> 
> 
> -Saravana
> 
>>
>> You can either test with that feature turned off, or apply something
>> like the following patch:
>>
>>    https://lkml.kernel.org/r/20241106135346.GL24862@noisy.programming.kicks-ass.net

-- 
Thanks and Regards,
Prateek