lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877bt29cgv.ffs@tglx>
Date: Wed, 28 Jan 2026 12:57:20 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: Ihor Solodrai <ihor.solodrai@...ux.dev>, LKML
 <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Gabriele Monaco
 <gmonaco@...hat.com>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Michael Jeanson <mjeanson@...icios.com>, Jens Axboe <axboe@...nel.dk>,
 "Paul E. McKenney" <paulmck@...nel.org>, "Gautham R. Shenoy"
 <gautham.shenoy@....com>, Florian Weimer <fweimer@...hat.com>, Tim Chen
 <tim.c.chen@...el.com>, Yury Norov <yury.norov@...il.com>, Shrikanth Hegde
 <sshegde@...ux.ibm.com>, bpf <bpf@...r.kernel.org>,
 sched-ext@...ts.linux.dev, Kernel Team <kernel-team@...a.com>, Alexei
 Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, Daniel
 Borkmann <daniel@...earbox.net>, Puranjay Mohan <puranjay@...nel.org>,
 Tejun Heo <tj@...nel.org>
Subject: Re: [patch V5 00/20] sched: Rewrite MM CID management

On Tue, Jan 27 2026 at 16:01, Ihor Solodrai wrote:
> BPF CI caught a deadlock on current bpf-next tip (35538dba51b4).
> Job: https://github.com/kernel-patches/bpf/actions/runs/21417415035/job/61670254640
>
> It appears to be related to this series. Pasting a splat below.

The deadlock splat is completely unrelated as it is a consequence of the
panic which is triggered by the watchdog:

> [   45.009755] watchdog: CPU2: Watchdog detected hard LOCKUP on cpu 2

...

> [   46.053170]   lock(&nmi_desc[NMI_LOCAL].lock);
> [   46.053172]   <Interrupt>
> [   46.053173]     lock(&nmi_desc[NMI_LOCAL].lock);

...

> Any ideas what might be going on?

Without a full backtrace of all CPUs it's hard to tell because it's
unclear what is holding the runqueue lock of CPU2 long enough to trigger
the hard lockup watchdog.

I'm pretty sure the CID changes are unrelated, that new code just happen
to show up as the messenger which gets stuck on the lock forever.

> [   46.053209] CPU: 2 UID: 0 PID: 126 Comm: test_progs Tainted: G           OE       6.19.0-rc5-g748c6d52700a-dirty #1 PREEMPT(full)
> [   46.053214] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [   46.053215] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   46.053217] Call Trace:
> [   46.053220]  <NMI>
> [   46.053223]  dump_stack_lvl+0x5d/0x80
> [   46.053227]  print_usage_bug.part.0+0x22b/0x2c0
> [   46.053231]  lock_acquire+0x272/0x2b0
> [   46.053235]  ? __register_nmi_handler+0x83/0x350
> [   46.053240]  _raw_spin_lock_irqsave+0x39/0x60
> [   46.053242]  ? __register_nmi_handler+0x83/0x350
> [   46.053246]  __register_nmi_handler+0x83/0x350
> [   46.053250]  native_stop_other_cpus+0x31c/0x460
> [   46.053255]  ? __pfx_native_stop_other_cpus+0x10/0x10
> [   46.053260]  vpanic+0x1c5/0x3f0

vpanic() really should disable lockdep here before taking that lock in
NMI context. The resulting lockdep splat is not really useful.

Thanks.

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ