[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877bt29cgv.ffs@tglx>
Date: Wed, 28 Jan 2026 12:57:20 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: Ihor Solodrai <ihor.solodrai@...ux.dev>, LKML
<linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Gabriele Monaco
<gmonaco@...hat.com>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Michael Jeanson <mjeanson@...icios.com>, Jens Axboe <axboe@...nel.dk>,
"Paul E. McKenney" <paulmck@...nel.org>, "Gautham R. Shenoy"
<gautham.shenoy@....com>, Florian Weimer <fweimer@...hat.com>, Tim Chen
<tim.c.chen@...el.com>, Yury Norov <yury.norov@...il.com>, Shrikanth Hegde
<sshegde@...ux.ibm.com>, bpf <bpf@...r.kernel.org>,
sched-ext@...ts.linux.dev, Kernel Team <kernel-team@...a.com>, Alexei
Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, Daniel
Borkmann <daniel@...earbox.net>, Puranjay Mohan <puranjay@...nel.org>,
Tejun Heo <tj@...nel.org>
Subject: Re: [patch V5 00/20] sched: Rewrite MM CID management
On Tue, Jan 27 2026 at 16:01, Ihor Solodrai wrote:
> BPF CI caught a deadlock on current bpf-next tip (35538dba51b4).
> Job: https://github.com/kernel-patches/bpf/actions/runs/21417415035/job/61670254640
>
> It appears to be related to this series. Pasting a splat below.
The deadlock splat is completely unrelated as it is a consequence of the
panic which is triggered by the watchdog:
> [ 45.009755] watchdog: CPU2: Watchdog detected hard LOCKUP on cpu 2
...
> [ 46.053170] lock(&nmi_desc[NMI_LOCAL].lock);
> [ 46.053172] <Interrupt>
> [ 46.053173] lock(&nmi_desc[NMI_LOCAL].lock);
...
> Any ideas what might be going on?
Without a full backtrace of all CPUs it's hard to tell because it's
unclear what is holding the runqueue lock of CPU2 long enough to trigger
the hard lockup watchdog.
I'm pretty sure the CID changes are unrelated, that new code just happen
to show up as the messenger which gets stuck on the lock forever.
> [ 46.053209] CPU: 2 UID: 0 PID: 126 Comm: test_progs Tainted: G OE 6.19.0-rc5-g748c6d52700a-dirty #1 PREEMPT(full)
> [ 46.053214] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [ 46.053215] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [ 46.053217] Call Trace:
> [ 46.053220] <NMI>
> [ 46.053223] dump_stack_lvl+0x5d/0x80
> [ 46.053227] print_usage_bug.part.0+0x22b/0x2c0
> [ 46.053231] lock_acquire+0x272/0x2b0
> [ 46.053235] ? __register_nmi_handler+0x83/0x350
> [ 46.053240] _raw_spin_lock_irqsave+0x39/0x60
> [ 46.053242] ? __register_nmi_handler+0x83/0x350
> [ 46.053246] __register_nmi_handler+0x83/0x350
> [ 46.053250] native_stop_other_cpus+0x31c/0x460
> [ 46.053255] ? __pfx_native_stop_other_cpus+0x10/0x10
> [ 46.053260] vpanic+0x1c5/0x3f0
vpanic() really should disable lockdep here before taking that lock in
NMI context. The resulting lockdep splat is not really useful.
Thanks.
tglx
Powered by blists - more mailing lists