linux-kernel - Re: NMI stack overflow during resume of PCIe bridge with CONFIG_HARDLOCKUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc20529d7520e7db7de2022bf9c96a1bc3a2f0df.camel@web.de>
Date: Tue, 13 Jan 2026 18:50:24 +0100
From: Bert Karwatzki <spasswolf@....de>
To: Thomas Gleixner <tglx@...nel.org>, linux-kernel@...r.kernel.org
Cc: linux-next@...r.kernel.org, spasswolf@....de, Mario Limonciello	
 <mario.limonciello@....com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>,  Clark Williams <clrkwllms@...nel.org>, Steven
 Rostedt <rostedt@...dmis.org>, Christian König	
 <christian.koenig@....com>, regressions@...ts.linux.dev, 
	linux-pci@...r.kernel.org, linux-acpi@...r.kernel.org, "Rafael J . Wysocki"
	 <rafael.j.wysocki@...el.com>, acpica-devel@...ts.linux.dev, Robert Moore	
 <robert.moore@...el.com>, Saket Dumbre <saket.dumbre@...el.com>, Bjorn
 Helgaas	 <bhelgaas@...gle.com>, Clemens Ladisch <clemens@...isch.de>,
 Jinchao Wang	 <wangjinchao600@...il.com>, Yury Norov
 <yury.norov@...il.com>, Anna Schumaker	 <anna.schumaker@...cle.com>,
 Baoquan He <bhe@...hat.com>, "Darrick J. Wong"	 <djwong@...nel.org>, Dave
 Young <dyoung@...hat.com>, Doug Anderson	 <dianders@...omium.org>,
 "Guilherme G. Piccoli" <gpiccoli@...lia.com>, Helge Deller <deller@....de>,
 Ingo Molnar <mingo@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,  Joanthan
 Cameron <Jonathan.Cameron@...wei.com>, Joel Granados
 <joel.granados@...nel.org>, John Ogness	 <john.ogness@...utronix.de>, Kees
 Cook <kees@...nel.org>, Li Huafei	 <lihuafei1@...wei.com>, "Luck, Tony"
 <tony.luck@...el.com>, Luo Gengkun	 <luogengkun@...weicloud.com>, Max
 Kellermann <max.kellermann@...os.com>, Nam Cao <namcao@...utronix.de>,
 oushixiong <oushixiong@...inos.cn>, Petr Mladek	 <pmladek@...e.com>,
 Qianqiang Liu <qianqiang.liu@....com>, Sergey Senozhatsky	
 <senozhatsky@...omium.org>, Sohil Mehta <sohil.mehta@...el.com>, Tejun Heo	
 <tj@...nel.org>, Thomas Zimemrmann <tzimmermann@...e.de>, Thorsten Blum	
 <thorsten.blum@...ux.dev>, Ville Syrjala <ville.syrjala@...ux.intel.com>, 
 Vivek Goyal <vgoyal@...hat.com>, Yunhui Cui <cuiyunhui@...edance.com>,
 Andrew Morton	 <akpm@...ux-foundation.org>, W_Armin@....de
Subject: Re: NMI stack overflow during resume of PCIe bridge with
 CONFIG_HARDLOCKUP_DETECTOR=y

Am Dienstag, dem 13.01.2026 um 16:24 +0100 schrieb Thomas Gleixner:
> On Tue, Jan 13 2026 at 10:41, Bert Karwatzki wrote:
> > Here's the result in case of the crash:
> > 2026-01-12T04:24:36.809904+01:00 T1510;acpi_ex_system_memory_space_handler 255: logical_addr_ptr = ffffc066977b3000
> > 2026-01-12T04:24:36.846170+01:00 C14;exc_nmi: 0
> 
> Here the NMI triggers in non-task context on CPU14
> 
> > 2026-01-12T04:24:36.960760+01:00 C14;exc_nmi: 10.3
> > 2026-01-12T04:24:36.960760+01:00 C14;default_do_nmi 
> > 2026-01-12T04:24:36.960760+01:00 C14;nmi_handle: type=0x0
> > 2026-01-12T04:24:36.960760+01:00 C14;nmi_handle: a=0xffffffffa1612de0
> > 2026-01-12T04:24:36.960760+01:00 C14;nmi_handle: a->handler=perf_event_nmi_handler+0x0/0xa6
> > 2026-01-12T04:24:36.960760+01:00 C14;perf_event_nmi_handler: 0
> > 2026-01-12T04:24:36.960760+01:00 C14;perf_event_nmi_handler: 1
> > 2026-01-12T04:24:36.960760+01:00 C14;perf_event_nmi_handler: 2
> > 2026-01-12T04:24:36.960760+01:00 C14;x86_pmu_handle_irq: 2
> > 2026-01-12T04:24:36.960760+01:00 C14;x86_pmu_handle_irq: 2.6
> > 2026-01-12T04:24:36.960760+01:00 C14;__perf_event_overflow: 0
> > 2026-01-12T04:24:36.960760+01:00 C14;__perf_event_overflow: 6.99: overflow_handler=watchdog_overflow_callback+0x0/0x10d
> > 2026-01-12T04:24:36.960760+01:00 C14;watchdog_overflow_callback: 0
> > 2026-01-12T04:24:36.960760+01:00 C14;__ktime_get_fast_ns_debug: 0.1
> > 2026-01-12T04:24:36.960760+01:00 C14;tk_clock_read_debug: read=read_hpet+0x0/0xf0
> > 2026-01-12T04:24:36.960760+01:00 C14;read_hpet: 0
> > 2026-01-12T04:24:36.960760+01:00 C14;read_hpet: 0.1
> 
> > 2026-01-12T04:24:36.960760+01:00 T0;exc_nmi: 0
> 
> This one triggers in task context of PID0, aka idle task, but it's not
> clear on which CPU that happens. It's probably CPU13 as that continues
> with the expected 10.3 output, but that's almost ~1.71 seconds later.
> 
The long delays seem to be typical for the first NMI after trying to access
the broken memory at phys_addr 0xf0100000, here's an example from an earlier
run with more printk()s in that part of the code (too many printk()s seem to
cause addtional system freezes ...)


2026-01-03T14:10:10.312182+01:00 T1511;acpi_ex_system_memory_space_handler 255: logical_addr_ptr = ffffbaa49c15d000
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 0
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 1
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 2
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 3
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 4
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 5
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 6
2026-01-03T14:10:10.616281+01:00 T0;exc_nmi: 7
2026-01-03T14:10:10.616281+01:00 T0;irqentry_nmi_enter: 0
2026-01-03T14:10:10.616281+01:00 T0;irqentry_nmi_enter: 1
2026-01-03T14:10:11.055800+01:00 C8;irqentry_nmi_enter: 2
2026-01-03T14:10:11.055800+01:00 C8;irqentry_nmi_enter: 3
2026-01-03T14:10:11.055800+01:00 C8;irqentry_nmi_enter: 4
2026-01-03T14:10:11.055800+01:00 C8;irqentry_nmi_enter: 5
2026-01-03T14:10:11.055800+01:00 C8;irqentry_nmi_enter: irq_state=0x0
2026-01-03T14:10:11.055800+01:00 C8;exc_nmi: 8
2026-01-03T14:10:11.055800+01:00 C8;exc_nmi: 9
2026-01-03T14:10:11.055800+01:00 C8;exc_nmi: 10.3

Position of printk()s in irqentry_nmi_enter() was:
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index e33691d5adf7..42cba2ea7aa1 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -370,12 +370,18 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 {
        irqentry_state_t irq_state;
 
+       printk(KERN_INFO "%s: 0\n", __func__);
        irq_state.lockdep = lockdep_hardirqs_enabled();
+       printk(KERN_INFO "%s: 1\n", __func__);
 
        __nmi_enter();
+       printk(KERN_INFO "%s: 2\n", __func__);
        lockdep_hardirqs_off(CALLER_ADDR0);
+       printk(KERN_INFO "%s: 3\n", __func__);
        lockdep_hardirq_enter();
+       printk(KERN_INFO "%s: 4\n", __func__);
        ct_nmi_enter();
+       printk(KERN_INFO "%s: 5\n", __func__);
 
        instrumentation_begin();
        kmsan_unpoison_entry_regs(regs);
@@ -383,6 +389,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
        ftrace_nmi_enter();
        instrumentation_end();
 
+       printk(KERN_INFO "%s: irq_state=0x%x\n", __func__, irq_state);
        return irq_state;
 }
 

>  What's more likely is that after a while
> _ALL_ CPUs are hung up in the NMI handler after they tripped over the
> HPET read.

I'm not sure about that, my latest testrun (with v6.18) crashed with only one message
from exc_nmi().

> 
> > The behaviour described here seems to be similar to the bug that commit
> > 3d5f4f15b778 ("watchdog: skip checks when panic is in progress") is fixing, but
> > this is actually a different bug as kernel 6.18 (which contains 3d5f4f15b778)
> > is also affected (I've conducted 5 tests with 6.18 so far and got 4 crashes (crashes occured
> > after (0.5h, 1h, 4.5h, 1.5h) of testing)). 
> > Nevertheless these look similar enough to CC the involved people.
> 
> There is nothing similar.
> 
> Your problem originates from a screwed up hardware state which in turn
> causes the HPET to go haywire for unknown reasons.
> 
> What is the physical address of this ACPI handler access:
> 
>        logical_addr_ptr = ffffc066977b3000
> 
> along with the full output of /proc/iomem

The physical address is 0xf0100000

$ cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
  000a0000-000dffff : PCI Bus 0000:00
  000f0000-000fffff : System ROM
00100000-09bfefff : System RAM
09bff000-0a000fff : Reserved
0a001000-0a1fffff : System RAM
0a200000-0a20efff : ACPI Non-volatile Storage
0a20f000-e6057fff : System RAM
  15000000-15b252c1 : Kernel code
  15c00000-15f60fff : Kernel rodata
  16000000-1610e27f : Kernel data
  165ce000-167fffff : Kernel bss
  9c000000-dbffffff : Crash kernel
e6058000-e614bfff : Reserved
e614c000-e868afff : System RAM
e868b000-e868bfff : Reserved
e868c000-e9cdefff : System RAM
e9cdf000-eb1fdfff : Reserved
  eb1dd000-eb1e0fff : MSFT0101:00
  eb1e1000-eb1e4fff : MSFT0101:00
eb1fe000-eb25dfff : ACPI Tables
eb25e000-eb555fff : ACPI Non-volatile Storage
eb556000-ed1fefff : Reserved
ed1ff000-edffffff : System RAM
ee000000-efffffff : Reserved
f0000000-fcffffff : PCI Bus 0000:00
  f0000000-f7ffffff : PCI ECAM 0000 [bus 00-7f]
    f0000000-f7ffffff : pnp 00:00
  fc500000-fc9fffff : PCI Bus 0000:08
    fc500000-fc5fffff : 0000:08:00.7
      fc500000-fc5fffff : pcie_mp2_amd
    fc600000-fc6fffff : 0000:08:00.4
      fc600000-fc6fffff : xhci-hcd
    fc700000-fc7fffff : 0000:08:00.3
      fc700000-fc7fffff : xhci-hcd
    fc800000-fc8fffff : 0000:08:00.2
      fc800000-fc8fffff : ccp
    fc900000-fc97ffff : 0000:08:00.0
    fc980000-fc9bffff : 0000:08:00.5
      fc980000-fc9bffff : AMD ACP3x audio
        fc980000-fc990200 : acp_pdm_iomem
    fc9c0000-fc9c7fff : 0000:08:00.6
      fc9c0000-fc9c7fff : ICH HD audio
    fc9c8000-fc9cbfff : 0000:08:00.1
      fc9c8000-fc9cbfff : ICH HD audio
    fc9cc000-fc9cdfff : 0000:08:00.7
    fc9ce000-fc9cffff : 0000:08:00.2
      fc9ce000-fc9cffff : ccp
  fca00000-fccfffff : PCI Bus 0000:01
    fca00000-fcbfffff : PCI Bus 0000:02
      fca00000-fcbfffff : PCI Bus 0000:03
        fca00000-fcafffff : 0000:03:00.0
        fcb00000-fcb1ffff : 0000:03:00.0
        fcb20000-fcb23fff : 0000:03:00.1
          fcb20000-fcb23fff : ICH HD audio
    fcc00000-fcc03fff : 0000:01:00.0
  fcd00000-fcdfffff : PCI Bus 0000:07
    fcd00000-fcd03fff : 0000:07:00.0
      fcd00000-fcd03fff : nvme
  fce00000-fcefffff : PCI Bus 0000:06
    fce00000-fce03fff : 0000:06:00.0
      fce00000-fce03fff : nvme
  fcf00000-fcffffff : PCI Bus 0000:05
    fcf00000-fcf03fff : 0000:05:00.0
    fcf04000-fcf04fff : 0000:05:00.0
      fcf04000-fcf04fff : r8169
fd300000-fd37ffff : amd_iommu
fec00000-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec10000-fec10fff : Reserved
  fec10000-fec10fff : pnp 00:04
fed00000-fed00fff : Reserved
  fed00000-fed003ff : HPET 0
    fed00000-fed003ff : PNP0103:00
fed40000-fed44fff : Reserved
fed80000-fed8ffff : Reserved
  fed81200-fed812ff : AMDI0030:00
  fed81500-fed818ff : AMDI0030:00
    fed81500-fed818ff : AMDI0030:00 AMDI0030:00
fedc0000-fedc0fff : pnp 00:04
fedc4000-fedc9fff : Reserved
  fedc5000-fedc5fff : AMDI0010:03
    fedc5000-fedc5fff : AMDI0010:03 AMDI0010:03
fedcc000-fedcefff : Reserved
fedd5000-fedd5fff : Reserved
fee00000-fee00fff : pnp 00:04
ff000000-ffffffff : pnp 00:04
100000000-3ee2fffff : System RAM
3ee300000-40fffffff : Reserved
410000000-ffffffffff : PCI Bus 0000:00
  fc00000000-fe0fffffff : PCI Bus 0000:01
    fc00000000-fe0fffffff : PCI Bus 0000:02
      fc00000000-fe0fffffff : PCI Bus 0000:03
        fc00000000-fdffffffff : 0000:03:00.0
        fe00000000-fe0fffffff : 0000:03:00.0
  fe20000000-fe301fffff : PCI Bus 0000:08
    fe20000000-fe2fffffff : 0000:08:00.0
    fe30000000-fe301fffff : 0000:08:00.0
  fe30300000-fe304fffff : PCI Bus 0000:04
    fe30300000-fe303fffff : 0000:04:00.0
      fe30300000-fe303fffff : 0000:04:00.0
    fe30400000-fe30403fff : 0000:04:00.0
    fe30404000-fe30404fff : 0000:04:00.0

> 
> Thanks,
> 
>         tglx

Thank you,

Bert Karwatzki