[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zy0VdUvZinPUQeZN@slm.duckdns.org>
Date: Thu, 7 Nov 2024 09:31:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org,
kernel-team@...a.com, sched-ext@...a.com,
Andrea Righi <arighi@...dia.com>,
Changwoo Min <multics69@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.13 2/2] sched_ext: Enable the ops
breather and eject BPF scheduler on softlockup
Hello, Doug.
On Wed, Nov 06, 2024 at 03:20:17PM -0800, Doug Anderson wrote:
> It still feels wrong to me that the softlockup detector duration is
> affecting how long schedext tasks are running. It feels like a
> fundamentally separate knob to be adjusting. You might want to stop
> misbehaving schedext tasks really quickly but otherwise leave the
> softlockup detector to be longer. Tying the two just seems weird.
The tying happens because softlockup can take a really drastic action of
resetting the whole machine.
> If we're trying to avoid duplicating code / avoid spinning up extra
> timers then it feels like separating out some common code makes sense
> and then that common code could be used by both the softlockup
> detector and the schedext watchdog. This would allow both to be
> configured separately. Yes, you could configure the schedext watchdog
> to be effectively "useless" by setting it to be too big, but that's
> true of lots of other watchdog-like things that are in the system. You
> have to set the timeouts sensibly. Certainly you could make the
> default something sensible, at least.
I don't really get the argument. It's just adding a simple notification to
tell another part of the kernel which can have effect on the condition being
detected that the threshold is imminent because it can resolve the situation
in an a lot more amicable way. I don't see what the big design problem is.
Sure, if we keep adding those notifications, we'd want to make the mechanism
more generic but that's not a difficult thing to do.
I don't see balance in your argument. Softlockup can already take a
remediative action, a pretty drastic one at that and that has practical
implications. In this case, it can be pretty easily dealt with. It solves a
practical problem. Even if we refactor everything and so that sched-ext can
do softlockup detection on its own (why? what's the benefit?), we still have
a coordination problem which is just brushed away. On the other side of the
scale is three lines of notification code. The trade off seems pretty clear.
> In any case, I'm not actually a maintainer here even if I've touched a
> lot of this code recently. As I said, if someone more senior wants to
> step in and say "Doug, you're wrong and everything looks great" then I
> won't be offended.
Andrew, if you don't object, I'll route the patches through the sched-ext
tree.
Thanks.
--
tejun
Powered by blists - more mailing lists