[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260113211548.GV745888@ziepe.ca>
Date: Tue, 13 Jan 2026 17:15:48 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Thomas Gleixner <tglx@...nel.org>
Cc: Bert Karwatzki <spasswolf@....de>, linux-kernel@...r.kernel.org,
linux-next@...r.kernel.org,
Mario Limonciello <mario.limonciello@....com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Christian König <christian.koenig@....com>,
regressions@...ts.linux.dev, linux-pci@...r.kernel.org,
linux-acpi@...r.kernel.org,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
acpica-devel@...ts.linux.dev, Robert Moore <robert.moore@...el.com>,
Saket Dumbre <saket.dumbre@...el.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Clemens Ladisch <clemens@...isch.de>,
Jinchao Wang <wangjinchao600@...il.com>,
Yury Norov <yury.norov@...il.com>,
Anna Schumaker <anna.schumaker@...cle.com>,
Baoquan He <bhe@...hat.com>, "Darrick J. Wong" <djwong@...nel.org>,
Dave Young <dyoung@...hat.com>,
Doug Anderson <dianders@...omium.org>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
Helge Deller <deller@....de>, Ingo Molnar <mingo@...nel.org>,
Joanthan Cameron <Jonathan.Cameron@...wei.com>,
Joel Granados <joel.granados@...nel.org>,
John Ogness <john.ogness@...utronix.de>,
Kees Cook <kees@...nel.org>, Li Huafei <lihuafei1@...wei.com>,
"Luck, Tony" <tony.luck@...el.com>,
Luo Gengkun <luogengkun@...weicloud.com>,
Max Kellermann <max.kellermann@...os.com>,
Nam Cao <namcao@...utronix.de>, oushixiong <oushixiong@...inos.cn>,
Petr Mladek <pmladek@...e.com>,
Qianqiang Liu <qianqiang.liu@....com>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Sohil Mehta <sohil.mehta@...el.com>, Tejun Heo <tj@...nel.org>,
Thomas Zimemrmann <tzimmermann@...e.de>,
Thorsten Blum <thorsten.blum@...ux.dev>,
Ville Syrjala <ville.syrjala@...ux.intel.com>,
Vivek Goyal <vgoyal@...hat.com>,
Yunhui Cui <cuiyunhui@...edance.com>,
Andrew Morton <akpm@...ux-foundation.org>, W_Armin@....de
Subject: Re: NMI stack overflow during resume of PCIe bridge with
CONFIG_HARDLOCKUP_DETECTOR=y
On Tue, Jan 13, 2026 at 08:30:46PM +0100, Thomas Gleixner wrote:
> So gradually your machine just stalls on outstanding MMIO transactions
> w/o further notice... The NMI is just a red herring.
CPUs usualy have timeouts for these things and they return 0xFF back
for the timed out read. Beyond that "it depends" if any other RAS
indications are raised.
> You need to figure out why that MMIO access to that device's
> configuration space stalls as anything else is just subsequent
> damage.
Given this is a resume it seems likely the PCI routing inside the
bridge chip has been messed up somehow during the suspend/resume.
Possibily due to errata in the bridge, there are many weird bridge
errata :\
Jason
Powered by blists - more mailing lists