[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=XuvhaQPoLN7q5JnraBGggN90aXPoSEFG-H80i368u5Xg@mail.gmail.com>
Date: Wed, 6 Nov 2024 15:20:17 -0800
From: Doug Anderson <dianders@...omium.org>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org, kernel-team@...a.com,
sched-ext@...a.com, Andrea Righi <arighi@...dia.com>, Changwoo Min <multics69@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.13 2/2] sched_ext: Enable the ops breather
and eject BPF scheduler on softlockup
Hi,
On Wed, Nov 6, 2024 at 3:07 PM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Wed, Nov 06, 2024 at 03:02:35PM -0800, Doug Anderson wrote:
> ...
> > Honestly, it would feel better to me if the soft lockup timer didn't
> > tell schedext to kill things but instead we just make some special
> > exception for "schedext" tasks and exclude them from the softlockup
> > detector because they're already being watched by their own watchdog.
> > Would that be possible? Then tweaking the "softlockup" timeouts
> > doesn't implicitly change how long schedext things can run.
>
> Some systems can get into full blown live-lock condition where CPUs are
> barely making forward progress through the scheduler and all normal (!RT &&
> !DEADLINE) tasks are on sched_ext, so the only reasonable way to exclude
> sched_ext would be disabling softlockup detection while sched_ext is
> enabled which doesn't feel like a sound trade-off.
Hmmm, I see.
It still feels wrong to me that the softlockup detector duration is
affecting how long schedext tasks are running. It feels like a
fundamentally separate knob to be adjusting. You might want to stop
misbehaving schedext tasks really quickly but otherwise leave the
softlockup detector to be longer. Tying the two just seems weird.
If we're trying to avoid duplicating code / avoid spinning up extra
timers then it feels like separating out some common code makes sense
and then that common code could be used by both the softlockup
detector and the schedext watchdog. This would allow both to be
configured separately. Yes, you could configure the schedext watchdog
to be effectively "useless" by setting it to be too big, but that's
true of lots of other watchdog-like things that are in the system. You
have to set the timeouts sensibly. Certainly you could make the
default something sensible, at least.
In any case, I'm not actually a maintainer here even if I've touched a
lot of this code recently. As I said, if someone more senior wants to
step in and say "Doug, you're wrong and everything looks great" then I
won't be offended.
-Doug
Powered by blists - more mailing lists