linux-kernel - Re: [patch V5 00/20] sched: Rewrite MM CID management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdfea828-4585-40e8-8835-247c6a8a76b0@linux.ibm.com>
Date: Wed, 28 Jan 2026 18:28:23 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Thomas Gleixner <tglx@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
        Ihor Solodrai <ihor.solodrai@...ux.dev>,
        LKML <linux-kernel@...r.kernel.org>
Cc: Gabriele Monaco <gmonaco@...hat.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Michael Jeanson <mjeanson@...icios.com>, Jens Axboe <axboe@...nel.dk>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>,
        Florian Weimer <fweimer@...hat.com>, Tim Chen <tim.c.chen@...el.com>,
        Yury Norov <yury.norov@...il.com>, bpf <bpf@...r.kernel.org>,
        sched-ext@...ts.linux.dev, Kernel Team <kernel-team@...a.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Puranjay Mohan
 <puranjay@...nel.org>, Tejun Heo <tj@...nel.org>
Subject: Re: [patch V5 00/20] sched: Rewrite MM CID management



On 1/28/26 5:27 PM, Thomas Gleixner wrote:
> On Tue, Jan 27 2026 at 16:01, Ihor Solodrai wrote:
>> BPF CI caught a deadlock on current bpf-next tip (35538dba51b4).
>> Job: https://github.com/kernel-patches/bpf/actions/runs/21417415035/job/61670254640
>>
>> It appears to be related to this series. Pasting a splat below.
> 
> The deadlock splat is completely unrelated as it is a consequence of the
> panic which is triggered by the watchdog:
> 
>> [   45.009755] watchdog: CPU2: Watchdog detected hard LOCKUP on cpu 2
> 
> ...
> 
>> [   46.053170]   lock(&nmi_desc[NMI_LOCAL].lock);
>> [   46.053172]   <Interrupt>
>> [   46.053173]     lock(&nmi_desc[NMI_LOCAL].lock);
> 
> ...
> 
>> Any ideas what might be going on?
> 
> Without a full backtrace of all CPUs it's hard to tell because it's
> unclear what is holding the runqueue lock of CPU2 long enough to trigger
> the hard lockup watchdog.
> 
> I'm pretty sure the CID changes are unrelated, that new code just happen
> to show up as the messenger which gets stuck on the lock forever.
> 
>> [   46.053209] CPU: 2 UID: 0 PID: 126 Comm: test_progs Tainted: G           OE       6.19.0-rc5-g748c6d52700a-dirty #1 PREEMPT(full)
>> [   46.053214] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> [   46.053215] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>> [   46.053217] Call Trace:
>> [   46.053220]  <NMI>
>> [   46.053223]  dump_stack_lvl+0x5d/0x80
>> [   46.053227]  print_usage_bug.part.0+0x22b/0x2c0
>> [   46.053231]  lock_acquire+0x272/0x2b0
>> [   46.053235]  ? __register_nmi_handler+0x83/0x350
>> [   46.053240]  _raw_spin_lock_irqsave+0x39/0x60
>> [   46.053242]  ? __register_nmi_handler+0x83/0x350
>> [   46.053246]  __register_nmi_handler+0x83/0x350
>> [   46.053250]  native_stop_other_cpus+0x31c/0x460
>> [   46.053255]  ? __pfx_native_stop_other_cpus+0x10/0x10
>> [   46.053260]  vpanic+0x1c5/0x3f0
> 
> vpanic() really should disable lockdep here before taking that lock in
> NMI context. The resulting lockdep splat is not really useful.
> 
> Thanks.
> 
>          tglx

Hi Thomas, Peter.


I remember running into this panic, once. But it wasn't consistent and i
couldn't hit it again. And it had vcpu overcommit, and fair bit of steal time.


The trace was like below from different CPUs.
------------------------

  watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188
  watchdog: CPU 23 TB:1434903268401795, last heartbeat TB:1434897252302837 (11750ms ago)
  NIP [c0000000001b7134] mm_get_cid+0xe8/0x188
  LR [c0000000001b7154] mm_get_cid+0x108/0x188
  Call Trace:
  [c000000004c37db0] [c000000001145d84] cpuidle_enter_state+0xf8/0x6a4 (unreliable)
  [c000000004c37e00] [c0000000001b95ac] mm_cid_switch_to+0x3c4/0x52c
  [c000000004c37e60] [c000000001147264] __schedule+0x47c/0x700
  [c000000004c37ee0] [c000000001147a70] schedule_idle+0x3c/0x64
  [c000000004c37f10] [c0000000001f6d70] do_idle+0x160/0x1b0
  [c000000004c37f60] [c0000000001f7084] cpu_startup_entry+0x48/0x50
  [c000000004c37f90] [c00000000005f570] start_secondary+0x284/0x288
  [c000000004c37fe0] [c00000000000e158] start_secondary_prolog+0x10/0x14


  watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c
  watchdog: CPU 11 TB:1434903340004919, last heartbeat TB:1434897249749892 (11895ms ago)
  NIP [c0000000000f84fc] plpar_hcall_norets_notrace+0x18/0x2c
  LR [c000000001152588] queued_spin_lock_slowpath+0xd88/0x15d0
  Call Trace:
  [c00000056b69fb10] [c00000056b69fba0] 0xc00000056b69fba0 (unreliable)
  [c00000056b69fc30] [c000000001153ce0] _raw_spin_lock+0x80/0xa0
  [c00000056b69fc50] [c0000000001b9a34] raw_spin_rq_lock_nested+0x3c/0xf8
  [c00000056b69fc80] [c0000000001b9bb8] mm_cid_fixup_cpus_to_tasks+0xc8/0x28c
  [c00000056b69fd00] [c0000000001bff34] sched_mm_cid_exit+0x108/0x22c
  [c00000056b69fd40] [c000000000167b08] do_exit+0xf4/0x5d0
  [c00000056b69fdf0] [c00000000016800c] make_task_dead+0x0/0x178
  [c00000056b69fe10] [c0000000000316c8] system_call_exception+0x128/0x390
  [c00000056b69fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec


  watchdog: CPU 65 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x10ec/0x15d0
  watchdog: CPU 65 TB:1434905824977447, last heartbeat TB:1434899309522065 (12725ms ago)
  NIP [c0000000011528ec] queued_spin_lock_slowpath+0x10ec/0x15d0
  LR [c000000001152d0c] queued_spin_lock_slowpath+0x150c/0x15d0
  Call Trace:
  [c000000777e27a60] [0000000000000009] 0x9 (unreliable)
  [c000000777e27b80] [c000000001153ce0] _raw_spin_lock+0x80/0xa0
  [c000000777e27ba0] [c0000000001b9a34] raw_spin_rq_lock_nested+0x3c/0xf8
  [c000000777e27bd0] [c0000000001babb8] ___task_rq_lock+0x64/0x140
  [c000000777e27c20] [c0000000001c8294] wake_up_new_task+0x180/0x484
  [c000000777e27ca0] [c00000000015bea4] kernel_clone+0x120/0x5bc
  [c000000777e27d30] [c00000000015c4c0] __do_sys_clone+0x88/0xc8
  [c000000777e27e10] [c0000000000316c8] system_call_exception+0x128/0x390
  [c000000777e27e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec




I am wondering if it this loop in mm_get_cid, which may not be getting a cid
for a long time? Is that possible?

static inline unsigned int mm_get_cid(struct mm_struct *mm)
{
         unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm->mm_cid.max_cids));

         while (cid == MM_CID_UNSET) {
                 cpu_relax();
                 cid = __mm_get_cid(mm, num_possible_cpus());
         }
         return cid;
}