[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=U7z-Lf_1T2cYyae3b6W5Joyp+oiRSp-iXe_3jz9Aqoaw@mail.gmail.com>
Date: Wed, 6 Nov 2024 13:32:40 -0800
From: Doug Anderson <dianders@...omium.org>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org, kernel-team@...a.com,
sched-ext@...a.com, Andrea Righi <arighi@...dia.com>, Changwoo Min <multics69@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH sched_ext/for-6.13 2/2] sched_ext: Enable the ops breather
and eject BPF scheduler on softlockup
Hi,
On Tue, Nov 5, 2024 at 1:49 PM Tejun Heo <tj@...nel.org> wrote:
>
> On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly
> behaving BPF scheduler can live-lock the system by making multiple CPUs bang
> on the same DSQ to the point where soft-lockup detection triggers before
> SCX's own watchdog can take action. It also seems possible that the machine
> can be live-locked enough to prevent scx_ops_helper, which is an RT task,
> from running in a timely manner.
>
> Implement scx_softlockup() which is called when three quarters of
> soft-lockup threshold has passed. The function immediately enables the ops
> breather and triggers an ops error to initiate ejection of the BPF
> scheduler.
>
> The previous and this patch combined enable the kernel to reliably recover
> the system from live-lock conditions that can be triggered by a poorly
> behaving BPF scheduler on Intel dual socket systems.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Cc: Douglas Anderson <dianders@...omium.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> ---
> include/linux/sched/ext.h | 2 +
> kernel/sched/ext.c | 45 ++++++++++++++++++++++++++++++++++++++
> kernel/watchdog.c | 8 ++++++
> tools/sched_ext/scx_show_state.py | 2 +
> 4 files changed, 57 insertions(+)
If someone more senior wants to override me then that's fine, but to
me this feels a bit too ugly/hacky to land. Specifically:
1. It doesn't feel right to add knowledge of "sched-ext" to the
softlockup detector. You're calling from a generic part of the kernel
to a specific part and it just feels unexpected, like there should be
some better boundaries between the two.
2. You're relying on a debug feature to enforce correctness. The
softlockup detector isn't designed to _fix_ softlockups. It's designed
to detect and report softlockups and then possibly reboot the machine.
Someone would not expect that turning on this debug feature would
cause the system to take the action of kicking out a BPF scheduler.
It feels like sched-ext should fix its own watchdog so it detects and
fixes the problem before the softlockup detector does.
-Doug
Powered by blists - more mailing lists