linux-kernel - Re: NMI stack overflow during resume of PCIe bridge with CONFIG_HARDLOCKUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99f1aaba32030d2b9285dbd983fdf8518a181a8d.camel@web.de>
Date: Tue, 13 Jan 2026 23:19:30 +0100
From: Bert Karwatzki <spasswolf@....de>
To: Thomas Gleixner <tglx@...nel.org>, linux-kernel@...r.kernel.org
Cc: spasswolf@....de, linux-next@...r.kernel.org, Mario Limonciello	
 <mario.limonciello@....com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>,  Clark Williams <clrkwllms@...nel.org>, Steven
 Rostedt <rostedt@...dmis.org>, Christian König	
 <christian.koenig@....com>, regressions@...ts.linux.dev, 
	linux-pci@...r.kernel.org, linux-acpi@...r.kernel.org, "Rafael J . Wysocki"
	 <rafael.j.wysocki@...el.com>, acpica-devel@...ts.linux.dev, Robert Moore	
 <robert.moore@...el.com>, Saket Dumbre <saket.dumbre@...el.com>, Bjorn
 Helgaas	 <bhelgaas@...gle.com>, Clemens Ladisch <clemens@...isch.de>,
 Jinchao Wang	 <wangjinchao600@...il.com>, Yury Norov
 <yury.norov@...il.com>, Anna Schumaker	 <anna.schumaker@...cle.com>,
 Baoquan He <bhe@...hat.com>, "Darrick J. Wong"	 <djwong@...nel.org>, Dave
 Young <dyoung@...hat.com>, Doug Anderson	 <dianders@...omium.org>,
 "Guilherme G. Piccoli" <gpiccoli@...lia.com>, Helge Deller <deller@....de>,
 Ingo Molnar <mingo@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,  Joanthan
 Cameron <Jonathan.Cameron@...wei.com>, Joel Granados
 <joel.granados@...nel.org>, John Ogness	 <john.ogness@...utronix.de>, Kees
 Cook <kees@...nel.org>, Li Huafei	 <lihuafei1@...wei.com>, "Luck, Tony"
 <tony.luck@...el.com>, Luo Gengkun	 <luogengkun@...weicloud.com>, Max
 Kellermann <max.kellermann@...os.com>, Nam Cao <namcao@...utronix.de>,
 oushixiong <oushixiong@...inos.cn>, Petr Mladek	 <pmladek@...e.com>,
 Qianqiang Liu <qianqiang.liu@....com>, Sergey Senozhatsky	
 <senozhatsky@...omium.org>, Sohil Mehta <sohil.mehta@...el.com>, Tejun Heo	
 <tj@...nel.org>, Thomas Zimemrmann <tzimmermann@...e.de>, Thorsten Blum	
 <thorsten.blum@...ux.dev>, Ville Syrjala <ville.syrjala@...ux.intel.com>, 
 Vivek Goyal <vgoyal@...hat.com>, Yunhui Cui <cuiyunhui@...edance.com>,
 Andrew Morton	 <akpm@...ux-foundation.org>, W_Armin@....de
Subject: Re: NMI stack overflow during resume of PCIe bridge with
 CONFIG_HARDLOCKUP_DETECTOR=y

Am Dienstag, dem 13.01.2026 um 20:30 +0100 schrieb Thomas Gleixner:
> On Tue, Jan 13 2026 at 18:50, Bert Karwatzki wrote:
> > Am Dienstag, dem 13.01.2026 um 16:24 +0100 schrieb Thomas Gleixner:
> > 
> > >  What's more likely is that after a while _ALL_ CPUs are hung up in
> > > the NMI handler after they tripped over the HPET read.
> > 
> > I'm not sure about that, my latest testrun (with v6.18) crashed with
> > only one message from exc_nmi().
> 
> What means crashed? Did it actually crash and output something or does
> the machine just go dead? I assume the latter as you have no output.
> 
Crashed means, rebooting and trying to reboot into the new kernel, which fails
(i.e. drops to rescue mode)
because one of my 2 nvme devices (not the root fs) is missing from the PCI bus
(the missing device reappears after a shutdown).
When sound is running while such a crash occurs the sound loops for about ~2s,
before such a reboot occurs (a behaviour I've seen during an unrelated NULL ptr deref in an
interrupt handler once). There's no additional error message output when such
a crash occurs, only my printk()s.

> > > along with the full output of /proc/iomem
> > 
> > The physical address is 0xf0100000
> > 
> > $ cat /proc/iomem
> > f0000000-fcffffff : PCI Bus 0000:00
> >   f0000000-f7ffffff : PCI ECAM 0000 [bus 00-7f]
> >     f0000000-f7ffffff : pnp 00:00
> 
> That's the memory mapped PCI config space and this tries to access:
> 
>    MMIO_START			0xf0000000
>    BUSNUM	0x01 << 20      0x00100000
>    SLOT/FN	0x00 << 12      0x00000000
>    OFFSET	0x00 <<  0      0x00000000
>                                -----------
>                                 0xf0100000
> 
> Offset 0 is vendor/device ID IIRC.
> 
> Anyway if that access does not complete because of a hardware issue,
> then any subsequent access to the MMIO mapped HPET goes stale as well.
> 
> As the HPET is the active clocksource on your machine, this obviously
> does not only affect the NMI watchdog readout, it affects the regular
> timekeeper accesses too and all other MMIO accesses all over the place.
> So gradually your machine just stalls on outstanding MMIO transactions
> w/o further notice... The NMI is just a red herring.
> 
I've already tried different clocsources, with mixed results. tsc does not
work on my system as it is unstable, while acpi_pm also crashed. With jiffies
I had mixed results. It might have avoided one crash, but crashed on another
testrun in the same situation.

> You need to figure out why that MMIO access to that device's
> configuration space stalls as anything else is just subsequent
> damage.
> 
> There is not much what can be done about that unless the PCI bus raises
> a failure interrupt and some magic reset sequence aborts the outstanding
> stalled transactions.
> 
> Whether that's feasible or not, I don't know. The failure mechanism
> might run into the same stall scenario when accessing the PCI muck for
> reset...

In the two testruns without CONFIG_HARDLOCKUP_DETECTOR I simply lost the discrete
GPU without a crash, i.e. after 4 and 8.5h the GPU would not resume any more, and
accessing the GPU with DRI_PRIME=1 glxinfo just gives a (user) segfault. But I need
more testruns to see if this behaviour is consistent.

> 
> Sorry for not being helpful.
> 
> Thanks,
> 
>         tglx
> 

Thanks you

Bert Karwatzki