linux-kernel - Re: [patch V3 00/12] rseq: Implement time slice extension mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2eee5e37-e541-4ac7-9479-cef3e62f234d@efficios.com>
Date: Mon, 10 Nov 2025 09:23:20 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Prakash Sangappa <prakash.sangappa@...cle.com>,
 Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra
 <peterz@...radead.org>, "Paul E. McKenney" <paulmck@...nel.org>,
 Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 Steven Rostedt <rostedt@...dmis.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Arnd Bergmann <arnd@...db.de>,
 "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [patch V3 00/12] rseq: Implement time slice extension mechanism

On 2025-11-06 12:28, Prakash Sangappa wrote:
[...]
> Hit this watchdog panic.
> 
> Using following tree. Assume this Is the latest.
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/ rseq/slice
> 
> Appears to be spinning in mm_get_cid(). Must be the mm cid changes.
> https://lore.kernel.org/all/20251029123717.886619142@linutronix.de/

When this happened during the development of the "complex" mm_cid
scheme, this was typically caused by a stale "mm_cid" being kept around
by a task even though it was not actually scheduled, thus causing
over-reservation of concurrency IDs beyond the max_cids threshold. This
ends up looping in:

static inline unsigned int mm_get_cid(struct mm_struct *mm)
{
         unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm->mm_cid.max_cids));

         while (cid == MM_CID_UNSET) {
                 cpu_relax();
                 cid = __mm_get_cid(mm, num_possible_cpus());
         }
         return cid;
}

Based on the stacktrace you provided, it seems to happen within
sched_mm_cid_fork() within copy_process, so perhaps it's simply an
initialization issue in fork, or an issue when cloning a new thread ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com