[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f27baf17-87e2-470e-8d09-ad435331543f@efficios.com>
Date: Wed, 12 Nov 2025 15:40:11 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Prakash Sangappa <prakash.sangappa@...cle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, LKML
<linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Jonathan Corbet <corbet@....net>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Arnd Bergmann <arnd@...db.de>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [patch V3 00/12] rseq: Implement time slice extension mechanism
On 2025-11-12 01:30, Prakash Sangappa wrote:
[...]
>
> The problem reproduces on a 2 socket AMD(384 cpus) bare metal system.
> It occurs soon after system boot up. Does not reproduce on a 64cpu VM.
>
> Managed to grep the ‘mksquashfs’ command that was executing, which triggers the panic.
>
> #ps -ef |grep mksquash.
> root 16614 10829 0 05:55 ? 00:00:00 mksquashfs /dev/null /var/tmp/dracut.iLs0z0/.squash-test.img -no-progress -comp xz
>
>
[...]
> ..
> [ 65.143712] cid bitmask ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff
> [ 65.143767] pid 16614, exec mksquashfs, maxcids 175 percpu 0 pcputhr 0, users 140 nrcpus_allwd 384
> [ 65.143769] cid bitmask ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
>
It's weird that the cid bitmask is all f values (all 1). Aren't those
zeroed on mm init ?
> Followed by the panic.
> [ 99.979256] watchdog: CPU114: Watchdog detected hard LOCKUP on cpu 114
> ..
[...]
>
> As you can see, at least when it cannot find available cid’s it is in per-task mm cid mode.
> Perhaps it is taking longer to drop used cid’s? I have not delved into the mm cid management.
> Hopeful you can make out something from the above trace.
>
> Let me know if you want me to add more tracing.
How soon is that after boot up ?
I'm starting to wonder if the num_possible_cpus() value used in
mm_cid_size() and mm_init_cid used respectively for mm allocation
and initialization may be read before it is initialized by the boot up
sequence ?
That's far fetched, but it would be good if we can double-check that
those are never called before the last call to init_cpu_possible and
set_cpu_possible().
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists