[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <24cde7b7-a03e-4ea0-a75a-e3717e316413@efficios.com>
Date: Mon, 10 Nov 2025 12:05:49 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Prakash Sangappa <prakash.sangappa@...cle.com>,
Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra
<peterz@...radead.org>, "Paul E. McKenney" <paulmck@...nel.org>,
Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Arnd Bergmann <arnd@...db.de>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [patch V3 00/12] rseq: Implement time slice extension mechanism
On 2025-11-10 09:23, Mathieu Desnoyers wrote:
> On 2025-11-06 12:28, Prakash Sangappa wrote:
> [...]
>> Hit this watchdog panic.
>>
>> Using following tree. Assume this Is the latest.
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/ rseq/
>> slice
>>
>> Appears to be spinning in mm_get_cid(). Must be the mm cid changes.
>> https://lore.kernel.org/all/20251029123717.886619142@linutronix.de/
>
> When this happened during the development of the "complex" mm_cid
> scheme, this was typically caused by a stale "mm_cid" being kept around
> by a task even though it was not actually scheduled, thus causing
> over-reservation of concurrency IDs beyond the max_cids threshold. This
> ends up looping in:
>
> static inline unsigned int mm_get_cid(struct mm_struct *mm)
> {
> unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm-
> >mm_cid.max_cids));
>
> while (cid == MM_CID_UNSET) {
> cpu_relax();
> cid = __mm_get_cid(mm, num_possible_cpus());
> }
> return cid;
> }
>
> Based on the stacktrace you provided, it seems to happen within
> sched_mm_cid_fork() within copy_process, so perhaps it's simply an
> initialization issue in fork, or an issue when cloning a new thread ?
One possible issue here: I note that kernel/sched/core.c:mm_init_cid()
misses the following initialization:
mm->mm_cid.transit = 0;
Thanks,
Mathieu
>
> Thanks,
>
> Mathieu
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists