lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260113211548.GV745888@ziepe.ca>
Date: Tue, 13 Jan 2026 17:15:48 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Thomas Gleixner <tglx@...nel.org>
Cc: Bert Karwatzki <spasswolf@....de>, linux-kernel@...r.kernel.org,
	linux-next@...r.kernel.org,
	Mario Limonciello <mario.limonciello@....com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Clark Williams <clrkwllms@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Christian König <christian.koenig@....com>,
	regressions@...ts.linux.dev, linux-pci@...r.kernel.org,
	linux-acpi@...r.kernel.org,
	"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
	acpica-devel@...ts.linux.dev, Robert Moore <robert.moore@...el.com>,
	Saket Dumbre <saket.dumbre@...el.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Clemens Ladisch <clemens@...isch.de>,
	Jinchao Wang <wangjinchao600@...il.com>,
	Yury Norov <yury.norov@...il.com>,
	Anna Schumaker <anna.schumaker@...cle.com>,
	Baoquan He <bhe@...hat.com>, "Darrick J. Wong" <djwong@...nel.org>,
	Dave Young <dyoung@...hat.com>,
	Doug Anderson <dianders@...omium.org>,
	"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
	Helge Deller <deller@....de>, Ingo Molnar <mingo@...nel.org>,
	Joanthan Cameron <Jonathan.Cameron@...wei.com>,
	Joel Granados <joel.granados@...nel.org>,
	John Ogness <john.ogness@...utronix.de>,
	Kees Cook <kees@...nel.org>, Li Huafei <lihuafei1@...wei.com>,
	"Luck, Tony" <tony.luck@...el.com>,
	Luo Gengkun <luogengkun@...weicloud.com>,
	Max Kellermann <max.kellermann@...os.com>,
	Nam Cao <namcao@...utronix.de>, oushixiong <oushixiong@...inos.cn>,
	Petr Mladek <pmladek@...e.com>,
	Qianqiang Liu <qianqiang.liu@....com>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Sohil Mehta <sohil.mehta@...el.com>, Tejun Heo <tj@...nel.org>,
	Thomas Zimemrmann <tzimmermann@...e.de>,
	Thorsten Blum <thorsten.blum@...ux.dev>,
	Ville Syrjala <ville.syrjala@...ux.intel.com>,
	Vivek Goyal <vgoyal@...hat.com>,
	Yunhui Cui <cuiyunhui@...edance.com>,
	Andrew Morton <akpm@...ux-foundation.org>, W_Armin@....de
Subject: Re: NMI stack overflow during resume of PCIe bridge with
 CONFIG_HARDLOCKUP_DETECTOR=y

On Tue, Jan 13, 2026 at 08:30:46PM +0100, Thomas Gleixner wrote:
> So gradually your machine just stalls on outstanding MMIO transactions
> w/o further notice... The NMI is just a red herring.

CPUs usualy have timeouts for these things and they return 0xFF back
for the timed out read. Beyond that "it depends" if any other RAS
indications are raised.
 
> You need to figure out why that MMIO access to that device's
> configuration space stalls as anything else is just subsequent
> damage.

Given this is a resume it seems likely the PCI routing inside the
bridge chip has been messed up somehow during the suspend/resume.

Possibily due to errata in the bridge, there are many weird bridge
errata :\

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ