[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240424185806.GB101235@kernel.org>
Date: Wed, 24 Apr 2024 14:58:06 -0400
From: Paul Gortmaker <paulg@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Richard Purdie <richard.purdie@...uxfoundation.org>
Subject: Intermittent Qemu boot hang/regression traced back to INT 0x80
changes
Richard (via the Yocto auto-builder) reported a sporadic (once per
hundreds) boot hang during the PCI bus boot mapping on v6.6.x
for both x86 and x86_64.
On x86, I isolated it to the INT 0x80 backports added to v6.6.7:
239bff0171a8 x86/tdx: Allow 32-bit emulation by default
22ca647c8f88 x86/entry: Do not allow external 0x80 interrupts
4591766ff655 x86/entry: Convert INT 0x80 emulation to IDTENTRY
34c686e5be2f x86/coco: Disable 32-bit emulation by default on TDX and SEV
f259af26ee04 x86: Introduce ia32_enabled()
The ia32_enabled() is a trivial compile dependency and the Yocto use
case doesn't even compile arch/x86/coco/tdx/tdx.c - leaving just the
middle three commits. I didn't try and bisect within those, since it
seemed relatively clear to me they were assumed to be taken as a group.
To confirm my diagnosis, I reverted this group of changes on v6.6.7
baseline, and the sporadic PCI-hang went away.
I then went to mainline and tested where it was added:
commit f35e46631b28a63ca3887d7afef1a65a5544da52
Author: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Thu Dec 7 11:56:34 2023 -0800
Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Took about 400 runs, but the PCI-hang eventually showed up.
Of course, the BHI changes touch a lot of the same files, and I was
wondering if the issue would remain. Tested v6.6.27 (has BHI backports)
and it would still happen. Can no longer easily revert the INT80
changes once they are buried under the BHI changes anymore though.
I then took v6.9-rc5 and let it run overnight (700 boots) and I "caught"
three instances of the PCI-hang.
Finally I took linux-next from today (next-20240424) and confirmed a
PCI-hang within 50 boots. I can't explain the variability other than it
being a shared machine where I ran the tests.
Not sure what to do next. Figured step #1 was to report it, at least.
A whole bunch of extra details are in the Yocto case:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=15463
..including the v6.9-rc5 .config and the full qemu arg list.
Paul.
--
Linux version 6.9.0-rc5-next-20240424-yocto-standard (oe-user@...host) (i686-poky-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.42.0.20240216) #1 SMP PREEMPT_DYNAMIC
Wed Apr 24 10:57:01 UTC 2024
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[...]
acpi PNP0A08:00: _OSC: platform does not support [LTR]
acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability]
acpi resource window ([0x100000000-0x8ffffffff] ignored, not CPU addressable)
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0x10000000-0xafffffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000 conventional PCI endpoint
pci 0000:00:01.0: [1234:1111] type 00 class 0x030000 conventional PCI endpoint
pci 0000:00:01.0: BAR 0 [mem 0xfd000000-0xfdffffff pref]
pci 0000:00:01.0: BAR 2 [mem 0xfebd0000-0xfebd0fff]
pci 0000:00:01.0: ROM [mem 0xfebc0000-0xfebcffff pref]
pci 0000:00:01.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
pci 0000:00:02.0: [1af4:1000] type 00 class 0x020000 conventional PCI endpoint
pci 0000:00:02.0: BAR 0 [io 0xc040-0xc05f]
pci 0000:00:02.0: BAR 1 [mem 0xfebd1000-0xfebd1fff]
pci 0000:00:02.0: BAR 4 [mem 0xfe000000-0xfe003fff 64bit pref]
pci 0000:00:02.0: ROM [mem 0xfeb80000-0xfebbffff pref]
pci 0000:00:03.0: [1af4:1005] type 00 class 0x00ff00 conventional PCI endpoint
pci 0000:00:03.0: BAR 0 [io 0xc060-0xc07f]
pci 0000:00:03.0: BAR 1 [mem 0xfebd2000-0xfebd2fff]
pci 0000:00:03.0: BAR 4 [mem 0xfe004000-0xfe007fff 64bit pref]
pci 0000:00:1d.0: [8086:2934] type 00 class 0x0c0300 conventional PCI endpoint
pci 0000:00:1d.0: BAR 4 [io 0xc080-0xc09f]
pci 0000:00:1d.1: [8086:2935] type 00 class 0x0c0300 conventional PCI endpoint
pci 0000:00:1d.1: BAR 4 [io 0xc0a0-0xc0bf]
pci 0000:00:1d.2: [8086:2936] type 00 class 0x0c0300 conventional PCI endpoint
<hang - not always exactly here, but always in this block of PCI printk>
Powered by blists - more mailing lists