linux-kernel - Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=Ujf7PJxBANGv4e6oVa9YMS4sNLvxp=u+=5n5aaAAn9Cw@mail.gmail.com>
Date: Thu, 13 Nov 2025 14:33:08 -0800
From: Doug Anderson <dianders@...omium.org>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Andrea Righi <andrea.righi@...ux.dev>, 
	Changwoo Min <changwoo@...lia.com>, Dan Schatzberg <schatzberg.dan@...il.com>, 
	Emil Tsalapatis <etsal@...a.com>, sched-ext@...ts.linux.dev, linux-kernel@...r.kernel.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Andrea Righi <arighi@...dia.com>
Subject: Re: [PATCH 09/13] sched_ext: Hook up hardlockup detector

Hi,

On Tue, Nov 11, 2025 at 11:18 AM Tejun Heo <tj@...nel.org> wrote:
>
> @@ -3711,6 +3711,24 @@ void scx_softlockup(u32 dur_s)
>                         smp_processor_id(), dur_s);
>  }
>
> +/**
> + * scx_hardlockup - sched_ext hardlockup handler
> + *
> + * A poorly behaving BPF scheduler can trigger hard lockup by e.g. putting
> + * numerous affinitized tasks in a single queue and directing all CPUs at it.
> + * Try kicking out the current scheduler in an attempt to recover the system to
> + * a good state before taking more drastic actions.
> + */
> +bool scx_hardlockup(void)

It's really not obvious what the return value for this function means
and it's not documented in the kernel doc. Could you put it there?

> +{
> +       if (!handle_lockup("hard lockup - CPU %d", smp_processor_id()))
> +               return false;

handle_lockup() and its return values also don't appear to be
documented and it's not super obvious (since it goes on to propogate
to scx_verror()).

I spent 5 minutes looking, and my best guess for handle_lockup() behavior:

If it does nothing, it doesn't print anything and returns false. Then
we'll continue to do a hard lockup.

If it has previously kicked scx, it prints the passed message and
returns false. Then we'll continue to do a hard lockup. Why does it
need to print a message in this case, though, since we'll print the
message once we return "false"?

If it disables scx it doesn't print anything and returns true. Then
we'll print a message about scx getting disabled and skip the hard
lockup actions.

Did I get that right? I didn't dig too deeply. I figured if it took me
more than 5 minutes to figure out it needs some documentation...

Also note that the CPU number you print here is a bit confusing. With
the buddy lockup detector the CPU that's locked and the CPU that's
running are different. Shouldn't you pass the locked CPU into this
function if you need to include it in your printouts?

> +       printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> +                       smp_processor_id());

Should the above be "disabled" instead of "disabling"? Mostly because
(I think) it already happened. Otherwise as a reader of the code I'm
looking to see where the disable happens in the future and I don't see
it.

> @@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  #ifdef CONFIG_SYSFS
>                 ++hardlockup_count;
>  #endif
> +               /*
> +                * A poorly behaving BPF scheduler can trigger hard lockup by
> +                * e.g. putting numerous affinitized tasks in a single queue and
> +                * directing all CPUs at it. The following call can return true
> +                * only once when sched_ext is enabled and will immediately
> +                * abort the BPF scheduler and print out a warning message.
> +                */
> +               if (scx_hardlockup())
> +                       return;

Should your test be before the "++hardlockup_count". If you return
early it doesn't seem like you should increment the count?

-Doug