linux-kernel - Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aRaE7ckOxUtDCcqU@slm.duckdns.org>
Date: Thu, 13 Nov 2025 15:25:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: David Vernet <void@...ifault.com>,
	Andrea Righi <andrea.righi@...ux.dev>,
	Changwoo Min <changwoo@...lia.com>,
	Dan Schatzberg <schatzberg.dan@...il.com>,
	Emil Tsalapatis <etsal@...a.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Righi <arighi@...dia.com>
Subject: Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector

Hello,

On Thu, Nov 13, 2025 at 02:33:08PM -0800, Doug Anderson wrote:
> > +bool scx_hardlockup(void)
> 
> It's really not obvious what the return value for this function means
> and it's not documented in the kernel doc. Could you put it there?
...
> handle_lockup() and its return values also don't appear to be
> documented and it's not super obvious (since it goes on to propogate
> to scx_verror()).
> 
> I spent 5 minutes looking, and my best guess for handle_lockup() behavior:

Will add documentation.

> If it does nothing, it doesn't print anything and returns false. Then
> we'll continue to do a hard lockup.
>
> If it has previously kicked scx, it prints the passed message and
> returns false. Then we'll continue to do a hard lockup. Why does it
> need to print a message in this case, though, since we'll print the
> message once we return "false"?

If abort was already initiated, it does nothing. No message printed. The
message passed into handle_lockup() is for reporting on sched_ext side.

> If it disables scx it doesn't print anything and returns true. Then
> we'll print a message about scx getting disabled and skip the hard
> lockup actions.

If it iniates disabling, it prints out that sched_ext is being disabled.

> Also note that the CPU number you print here is a bit confusing. With
> the buddy lockup detector the CPU that's locked and the CPU that's
> running are different. Shouldn't you pass the locked CPU into this
> function if you need to include it in your printouts?

Good point. Will update.

> > +       printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> > +                       smp_processor_id());
> 
> Should the above be "disabled" instead of "disabling"? Mostly because
> (I think) it already happened. Otherwise as a reader of the code I'm
> looking to see where the disable happens in the future and I don't see
> it.

It initiates disabling but disabling is asynchronous. The first step of
disabling - aborting in-flight operations and falling back to safe in-kernel
scheduling is done synchronously by scx_claim_exit(), so there's an
immediate effect; however, there's whole lot more that happens
asynchronously in scx_disable_workfn() afterwards.

> > @@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  #ifdef CONFIG_SYSFS
> >                 ++hardlockup_count;
> >  #endif
> > +               /*
> > +                * A poorly behaving BPF scheduler can trigger hard lockup by
> > +                * e.g. putting numerous affinitized tasks in a single queue and
> > +                * directing all CPUs at it. The following call can return true
> > +                * only once when sched_ext is enabled and will immediately
> > +                * abort the BPF scheduler and print out a warning message.
> > +                */
> > +               if (scx_hardlockup())
> > +                       return;
> 
> Should your test be before the "++hardlockup_count". If you return
> early it doesn't seem like you should increment the count?

I don't know. It is still a hardlockup event. We just first try to abort
sched_ext as that has a reasonable chance to resolve the condition, and, if
that succeeds, there will be messages indicating hardlockup occurred and
sched_ext was disabled. Wouldn't it be confusing if the reported hardlockup
count doesn't reflect that?

Thanks.

-- 
tejun