lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zy0VdUvZinPUQeZN@slm.duckdns.org>
Date: Thu, 7 Nov 2024 09:31:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org,
	kernel-team@...a.com, sched-ext@...a.com,
	Andrea Righi <arighi@...dia.com>,
	Changwoo Min <multics69@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.13 2/2] sched_ext: Enable the ops
 breather and eject BPF scheduler on softlockup

Hello, Doug.

On Wed, Nov 06, 2024 at 03:20:17PM -0800, Doug Anderson wrote:
> It still feels wrong to me that the softlockup detector duration is
> affecting how long schedext tasks are running. It feels like a
> fundamentally separate knob to be adjusting. You might want to stop
> misbehaving schedext tasks really quickly but otherwise leave the
> softlockup detector to be longer. Tying the two just seems weird.

The tying happens because softlockup can take a really drastic action of
resetting the whole machine.

> If we're trying to avoid duplicating code / avoid spinning up extra
> timers then it feels like separating out some common code makes sense
> and then that common code could be used by both the softlockup
> detector and the schedext watchdog. This would allow both to be
> configured separately. Yes, you could configure the schedext watchdog
> to be effectively "useless" by setting it to be too big, but that's
> true of lots of other watchdog-like things that are in the system. You
> have to set the timeouts sensibly. Certainly you could make the
> default something sensible, at least.

I don't really get the argument. It's just adding a simple notification to
tell another part of the kernel which can have effect on the condition being
detected that the threshold is imminent because it can resolve the situation
in an a lot more amicable way. I don't see what the big design problem is.
Sure, if we keep adding those notifications, we'd want to make the mechanism
more generic but that's not a difficult thing to do.

I don't see balance in your argument. Softlockup can already take a
remediative action, a pretty drastic one at that and that has practical
implications. In this case, it can be pretty easily dealt with. It solves a
practical problem. Even if we refactor everything and so that sched-ext can
do softlockup detection on its own (why? what's the benefit?), we still have
a coordination problem which is just brushed away. On the other side of the
scale is three lines of notification code. The trade off seems pretty clear.

> In any case, I'm not actually a maintainer here even if I've touched a
> lot of this code recently. As I said, if someone more senior wants to
> step in and say "Doug, you're wrong and everything looks great" then I
> won't be offended.

Andrew, if you don't object, I'll route the patches through the sched-ext
tree.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ