[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRaE7ckOxUtDCcqU@slm.duckdns.org>
Date: Thu, 13 Nov 2025 15:25:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: David Vernet <void@...ifault.com>,
Andrea Righi <andrea.righi@...ux.dev>,
Changwoo Min <changwoo@...lia.com>,
Dan Schatzberg <schatzberg.dan@...il.com>,
Emil Tsalapatis <etsal@...a.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Righi <arighi@...dia.com>
Subject: Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector
Hello,
On Thu, Nov 13, 2025 at 02:33:08PM -0800, Doug Anderson wrote:
> > +bool scx_hardlockup(void)
>
> It's really not obvious what the return value for this function means
> and it's not documented in the kernel doc. Could you put it there?
...
> handle_lockup() and its return values also don't appear to be
> documented and it's not super obvious (since it goes on to propogate
> to scx_verror()).
>
> I spent 5 minutes looking, and my best guess for handle_lockup() behavior:
Will add documentation.
> If it does nothing, it doesn't print anything and returns false. Then
> we'll continue to do a hard lockup.
>
> If it has previously kicked scx, it prints the passed message and
> returns false. Then we'll continue to do a hard lockup. Why does it
> need to print a message in this case, though, since we'll print the
> message once we return "false"?
If abort was already initiated, it does nothing. No message printed. The
message passed into handle_lockup() is for reporting on sched_ext side.
> If it disables scx it doesn't print anything and returns true. Then
> we'll print a message about scx getting disabled and skip the hard
> lockup actions.
If it iniates disabling, it prints out that sched_ext is being disabled.
> Also note that the CPU number you print here is a bit confusing. With
> the buddy lockup detector the CPU that's locked and the CPU that's
> running are different. Shouldn't you pass the locked CPU into this
> function if you need to include it in your printouts?
Good point. Will update.
> > + printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> > + smp_processor_id());
>
> Should the above be "disabled" instead of "disabling"? Mostly because
> (I think) it already happened. Otherwise as a reader of the code I'm
> looking to see where the disable happens in the future and I don't see
> it.
It initiates disabling but disabling is asynchronous. The first step of
disabling - aborting in-flight operations and falling back to safe in-kernel
scheduling is done synchronously by scx_claim_exit(), so there's an
immediate effect; however, there's whole lot more that happens
asynchronously in scx_disable_workfn() afterwards.
> > @@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > #ifdef CONFIG_SYSFS
> > ++hardlockup_count;
> > #endif
> > + /*
> > + * A poorly behaving BPF scheduler can trigger hard lockup by
> > + * e.g. putting numerous affinitized tasks in a single queue and
> > + * directing all CPUs at it. The following call can return true
> > + * only once when sched_ext is enabled and will immediately
> > + * abort the BPF scheduler and print out a warning message.
> > + */
> > + if (scx_hardlockup())
> > + return;
>
> Should your test be before the "++hardlockup_count". If you return
> early it doesn't seem like you should increment the count?
I don't know. It is still a hardlockup event. We just first try to abort
sched_ext as that has a reasonable chance to resolve the condition, and, if
that succeeds, there will be messages indicating hardlockup occurred and
sched_ext was disabled. Wouldn't it be confusing if the reported hardlockup
count doesn't reflect that?
Thanks.
--
tejun
Powered by blists - more mailing lists