[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251203124422.GBaTAwpmOzChS2MDhX@fat_crate.local>
Date: Wed, 3 Dec 2025 13:44:22 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: iommu@...ts.linux.dev, Joerg Roedel <joro@...tes.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
linux-kernel@...r.kernel.org
Subject: Re: amd iommu: rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { 0-.... } 8 jiffies s: 113 root: 0x1/.
On Fri, Nov 28, 2025 at 12:28:34PM -0800, Paul E. McKenney wrote:
> Sorry to be slow, USA Turkey Day and all that...
Nothing to be sorry for - email is asynchronous communication. :-P
> This one of course is a stall on CPU 0. But you knew that already.
>
> Also, it looks like you have CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 or maybe
> booted with rcupdate.rcu_exp_cpu_stall_timeout=20 on a system with HZ=250?
> Or set rcu_exp_cpu_stall_timeout=20 via sysfs?
Not really - this is me simply doing "make olddefconfig" on a .config and then
using it on the test box. I'm simply doing defaults and I can imagine they
have changed over the years.
[boris@zn: ~/kernel/configs/brent> grep CONFIG_RCU_EXP_CPU_STALL_TIMEOUT config-6.18.0-rc7+
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
Yap, 20 it is.
> This is the beginning of the message.
>
> > [ 6.971581] Key type fscrypt-provisioning registered
> > [ 6.975191] PM: Image not found (code -6)
> > [ 6.975631] } 8 jiffies s: 89 root: 0x0/.
>
> And this is the end. This looks like the stall ended just as the
> stall-warning message started printing.
I suspected that... Judging by your explanation, I don't think we can stop
printing empty stall messages - sounds like they're multiline and that comes
from different places in the code.
To avoid confusion, I mean...
> It also looks like you have the expedited stall warning set to 20
> milliseconds, which as far as I know is used only on constrained systems
> such as smartphones.
That "smartphone" can't possibly fit in my pocket! :-P :-P
> If you set this value on a typical large server, you will get very large
> numbers of expedited RCU CPU stall warnings.
Should I reset it to its default 0?
And for that other value I have there:
config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
range 3 300
default 21
which is weird. I guess I need to reset all those to something sensible for
server...
> Oh, and if you are running with HZ=1000 and the expedited RCU CPU stall
> warning set to 20 milliseconds (let alone 8!), then as far as I know,
> you are a pioneer breaking new ground. ;-)
I do things like that from tim to time...
But nah, it is 250:
# CONFIG_HZ_PERIODIC is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists