lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=U7z-Lf_1T2cYyae3b6W5Joyp+oiRSp-iXe_3jz9Aqoaw@mail.gmail.com>
Date: Wed, 6 Nov 2024 13:32:40 -0800
From: Doug Anderson <dianders@...omium.org>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org, kernel-team@...a.com, 
	sched-ext@...a.com, Andrea Righi <arighi@...dia.com>, Changwoo Min <multics69@...il.com>, 
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.13 2/2] sched_ext: Enable the ops breather
 and eject BPF scheduler on softlockup

Hi,

On Tue, Nov 5, 2024 at 1:49 PM Tejun Heo <tj@...nel.org> wrote:
>
> On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly
> behaving BPF scheduler can live-lock the system by making multiple CPUs bang
> on the same DSQ to the point where soft-lockup detection triggers before
> SCX's own watchdog can take action. It also seems possible that the machine
> can be live-locked enough to prevent scx_ops_helper, which is an RT task,
> from running in a timely manner.
>
> Implement scx_softlockup() which is called when three quarters of
> soft-lockup threshold has passed. The function immediately enables the ops
> breather and triggers an ops error to initiate ejection of the BPF
> scheduler.
>
> The previous and this patch combined enable the kernel to reliably recover
> the system from live-lock conditions that can be triggered by a poorly
> behaving BPF scheduler on Intel dual socket systems.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Cc: Douglas Anderson <dianders@...omium.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> ---
>  include/linux/sched/ext.h         |    2 +
>  kernel/sched/ext.c                |   45 ++++++++++++++++++++++++++++++++++++++
>  kernel/watchdog.c                 |    8 ++++++
>  tools/sched_ext/scx_show_state.py |    2 +
>  4 files changed, 57 insertions(+)

If someone more senior wants to override me then that's fine, but to
me this feels a bit too ugly/hacky to land. Specifically:

1. It doesn't feel right to add knowledge of "sched-ext" to the
softlockup detector. You're calling from a generic part of the kernel
to a specific part and it just feels unexpected, like there should be
some better boundaries between the two.

2. You're relying on a debug feature to enforce correctness. The
softlockup detector isn't designed to _fix_ softlockups. It's designed
to detect and report softlockups and then possibly reboot the machine.
Someone would not expect that turning on this debug feature would
cause the system to take the action of kicking out a BPF scheduler.


It feels like sched-ext should fix its own watchdog so it detects and
fixes the problem before the softlockup detector does.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ