[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <17147c98-9f0d-45d0-9593-3ede27ba6135@paulmck-laptop>
Date: Fri, 28 Nov 2025 12:28:34 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Borislav Petkov <bp@...en8.de>
Cc: iommu@...ts.linux.dev, Joerg Roedel <joro@...tes.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
linux-kernel@...r.kernel.org
Subject: Re: amd iommu: rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { 0-.... } 8 jiffies s: 113 root: 0x1/.
Sorry to be slow, USA Turkey Day and all that...
On Wed, Nov 26, 2025 at 04:26:37PM +0100, Borislav Petkov wrote:
> On Wed, Nov 26, 2025 at 03:32:19PM +0100, Borislav Petkov wrote:
> > Hi,
> >
> > this is latest Linus + latest tip/master. Box is Zen3. CCing AMD IOMMU folks
> > because the backtrace points to it.
> >
> > Ideas?
> >
> > [ 12.946913] (journald)[506]: Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
> > [ 12.948083] (journald)[506]: Successfully forked off '(sd-mkuserns)' as PID 507.
> > [ 12.977579] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 8 jiffies s: 113 root: 0x1/.
> > [ 12.983638] rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x1/.
This one of course is a stall on CPU 0. But you knew that already.
Also, it looks like you have CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 or maybe
booted with rcupdate.rcu_exp_cpu_stall_timeout=20 on a system with HZ=250?
Or set rcu_exp_cpu_stall_timeout=20 via sysfs?
> And as suspected, booting in it again, it doesn't trigger anymore. But there's
> something new in dmesg which looks weird and makes me want to Cc Paul:
>
> [ 6.965526] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
This is the beginning of the message.
> [ 6.971581] Key type fscrypt-provisioning registered
> [ 6.975191] PM: Image not found (code -6)
> [ 6.975631] } 8 jiffies s: 89 root: 0x0/.
And this is the end. This looks like the stall ended just as the
stall-warning message started printing.
> and
>
> [ 12.549532] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 8 jiffies s: 113 root: 0x1/.
> [ 12.550863] rcu: blocking rcu_node structures (internal RCU debug):
This is a stall on CPU 2.
>
> [ 12.817601] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
> [ 12.819773] (sd-mkdcre[520]: Credential search path is: /etc/credstore:/run/credstore:/usr/local/lib/credstore:/usr/lib/credstore
> [ 12.827074] } 8 jiffies s: 129 root: 0x0/.
>
> [ 12.881508] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
> [ 12.892854] (sd-mkdcr[522]: Credential search path is: /etc/credstore.encrypted:/run/credstore.encrypted:/usr/local/lib/credstore.encrypted:/usr/lib/credstore.encrypted
> [ 12.905244] } 8 jiffies s: 133 root: 0x0/.
>
> Paul, this looks weird.
>
> Why is that issuing empty lists between the { }?
Again, my guess is that the stall is ending just as the print starts.
It also looks like you have the expedited stall warning set to 20
milliseconds, which as far as I know is used only on constrained systems
such as smartphones. If you set this value on a typical large server,
you will get very large numbers of expedited RCU CPU stall warnings.
Oh, and if you are running with HZ=1000 and the expedited RCU CPU stall
warning set to 20 milliseconds (let alone 8!), then as far as I know,
you are a pioneer breaking new ground. ;-)
Thanx, Paul
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists