lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=XuvhaQPoLN7q5JnraBGggN90aXPoSEFG-H80i368u5Xg@mail.gmail.com>
Date: Wed, 6 Nov 2024 15:20:17 -0800
From: Doug Anderson <dianders@...omium.org>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org, kernel-team@...a.com, 
	sched-ext@...a.com, Andrea Righi <arighi@...dia.com>, Changwoo Min <multics69@...il.com>, 
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.13 2/2] sched_ext: Enable the ops breather
 and eject BPF scheduler on softlockup

Hi,

On Wed, Nov 6, 2024 at 3:07 PM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Wed, Nov 06, 2024 at 03:02:35PM -0800, Doug Anderson wrote:
> ...
> > Honestly, it would feel better to me if the soft lockup timer didn't
> > tell schedext to kill things but instead we just make some special
> > exception for "schedext" tasks and exclude them from the softlockup
> > detector because they're already being watched by their own watchdog.
> > Would that be possible? Then tweaking the "softlockup" timeouts
> > doesn't implicitly change how long schedext things can run.
>
> Some systems can get into full blown live-lock condition where CPUs are
> barely making forward progress through the scheduler and all normal (!RT &&
> !DEADLINE) tasks are on sched_ext, so the only reasonable way to exclude
> sched_ext would be disabling softlockup detection while sched_ext is
> enabled which doesn't feel like a sound trade-off.

Hmmm, I see.

It still feels wrong to me that the softlockup detector duration is
affecting how long schedext tasks are running. It feels like a
fundamentally separate knob to be adjusting. You might want to stop
misbehaving schedext tasks really quickly but otherwise leave the
softlockup detector to be longer. Tying the two just seems weird.

If we're trying to avoid duplicating code / avoid spinning up extra
timers then it feels like separating out some common code makes sense
and then that common code could be used by both the softlockup
detector and the schedext watchdog. This would allow both to be
configured separately. Yes, you could configure the schedext watchdog
to be effectively "useless" by setting it to be too big, but that's
true of lots of other watchdog-like things that are in the system. You
have to set the timeouts sensibly. Certainly you could make the
default something sensible, at least.

In any case, I'm not actually a maintainer here even if I've touched a
lot of this code recently. As I said, if someone more senior wants to
step in and say "Doug, you're wrong and everything looks great" then I
won't be offended.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ