[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb2785d6-ee19-4277-9906-d287341c7698@paulmck-laptop>
Date: Wed, 3 Dec 2025 09:16:37 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Borislav Petkov <bp@...en8.de>
Cc: iommu@...ts.linux.dev, Joerg Roedel <joro@...tes.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
linux-kernel@...r.kernel.org
Subject: Re: amd iommu: rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { 0-.... } 8 jiffies s: 113 root: 0x1/.
On Wed, Dec 03, 2025 at 01:44:22PM +0100, Borislav Petkov wrote:
> On Fri, Nov 28, 2025 at 12:28:34PM -0800, Paul E. McKenney wrote:
> > Sorry to be slow, USA Turkey Day and all that...
>
> Nothing to be sorry for - email is asynchronous communication. :-P
>
> > This one of course is a stall on CPU 0. But you knew that already.
> >
> > Also, it looks like you have CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 or maybe
> > booted with rcupdate.rcu_exp_cpu_stall_timeout=20 on a system with HZ=250?
> > Or set rcu_exp_cpu_stall_timeout=20 via sysfs?
>
> Not really - this is me simply doing "make olddefconfig" on a .config and then
> using it on the test box. I'm simply doing defaults and I can imagine they
> have changed over the years.
>
> [boris@zn: ~/kernel/configs/brent> grep CONFIG_RCU_EXP_CPU_STALL_TIMEOUT config-6.18.0-rc7+
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
>
> Yap, 20 it is.
>
> > This is the beginning of the message.
> >
> > > [ 6.971581] Key type fscrypt-provisioning registered
> > > [ 6.975191] PM: Image not found (code -6)
> > > [ 6.975631] } 8 jiffies s: 89 root: 0x0/.
> >
> > And this is the end. This looks like the stall ended just as the
> > stall-warning message started printing.
>
> I suspected that... Judging by your explanation, I don't think we can stop
> printing empty stall messages - sounds like they're multiline and that comes
> from different places in the code.
>
> To avoid confusion, I mean...
Actually, there are some things that can be done. If you really meant
to test with 20 milliseconds instead of the normal server-class 21
seconds, please let me know and I can see about adjusting. No promises
on schedule, though, LPC being next week and all.
> > It also looks like you have the expedited stall warning set to 20
> > milliseconds, which as far as I know is used only on constrained systems
> > such as smartphones.
>
> That "smartphone" can't possibly fit in my pocket! :-P :-P
Then do you *really* want to be setting the expedited RCU CPU stall
warning to 20 milliseconds? That value was set up specifically for
constrained systems that don't scan huge memories or have huge piles of
running tasks, not for datacenter servers.
> > If you set this value on a typical large server, you will get very large
> > numbers of expedited RCU CPU stall warnings.
>
> Should I reset it to its default 0?
Or to some value that works for you. But if you are not looking to be
an expedited RCU CPU stall-warning pioneer, yes, setting it to zero is
a good approach.
If you would like to be a more sane pioneer, setting it to (say) 11000
(or 11 seconds) could be appropriate. But what fun is sanity? ;-)
> And for that other value I have there:
>
> config RCU_CPU_STALL_TIMEOUT
> int "RCU CPU stall timeout in seconds"
> depends on RCU_STALL_COMMON
> range 3 300
> default 21
>
> which is weird. I guess I need to reset all those to something sensible for
> server...
No, not weird, see "seconds", not "milliseconds".
> > Oh, and if you are running with HZ=1000 and the expedited RCU CPU stall
> > warning set to 20 milliseconds (let alone 8!), then as far as I know,
> > you are a pioneer breaking new ground. ;-)
>
> I do things like that from tim to time...
>
> But nah, it is 250:
>
> # CONFIG_HZ_PERIODIC is not set
> # CONFIG_HZ_100 is not set
> CONFIG_HZ_250=y
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=250
OK, then all thoae "8 jiffies" in your console log make sense. ;-)
Thanx, Paul
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists