lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx830PZyr_oZAkghR=CLsThLUX1hZRxrNK_FNSLuF2TBAw@mail.gmail.com>
Date: Thu, 7 Nov 2024 23:28:07 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Ingo Molnar <mingo@...hat.com>, "Peter Zijlstra (Intel)" <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	wuyun.abel@...edance.com, youssefesmat@...omium.org, 
	Thomas Gleixner <tglx@...utronix.de>, efault@....de, 
	K Prateek Nayak <kprateek.nayak@....com>, John Stultz <jstultz@...gle.com>, 
	Vincent Palomares <paillon@...gle.com>
Subject: Very high scheduling delay with plenty of idle CPUs

Hi scheduler folks,

I'm running into some weird scheduling issues when testing non-sched
changes on a Pixel 6 that's running close to 6.12-rc5. I'm not sure if
this is an issue in earlier kernel versions or not.

The async suspend/resume code calls async_schedule_dev_nocall() to
queue up a bunch of work. These queued up work seem to be running in
kworker threads.

However, there have been many times where I see scheduling latency
(runnable, but not running) of 4.5 ms or higher for a kworker thread
when there are plenty of idle CPUs.

Does async_schedule_dev_nocall() have some weird limitations on where
they can be run? I know it has some NUMA related stuff, but the Pixel
6 doesn't have NUMA. This oddity unnecessarily increases
suspend/resume latency as it adds up across kworker threads. So, I'd
appreciate any insights on what might be happening?

If you know how to use perfetto (it's really pretty simple, all you
need to know is WASD and clicking), here's an example:
https://ui.perfetto.dev/#!/?s=e20045736e7dfa1e897db6489710061d2495be92

Thanks,
Saravana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ