lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250304210122.GA257363@bhelgaas>
Date: Tue, 4 Mar 2025 15:01:22 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Naveen Kumar P <naveenkumar.parna@...il.com>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	kernelnewbies <kernelnewbies@...nelnewbies.org>,
	linux-acpi@...r.kernel.org
Subject: Re: PCI: hotplug_event: PCIe PLDA Device BAR Reset

On Tue, Mar 04, 2025 at 10:19:07PM +0530, Naveen Kumar P wrote:
> On Tue, Mar 4, 2025 at 1:35 PM Naveen Kumar P
> <naveenkumar.parna@...il.com> wrote:
> ...

> For this test run, I removed all three parameters (pcie_aspm=off,
> pci=nomsi, and pcie_ports=on) and booted with the following kernel
> command line arguments:
> 
> cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-6.13.0+ root=/dev/mapper/vg00-rootvol ro quiet
> "dyndbg=file drivers/pci/* +p; file drivers/acpi/bus.c +p; file
> drivers/acpi/osl.c +p"
> 
> This time, the issue occurred earlier, at 22998 seconds. Below is the
> relevant dmesg log during the ACPI_NOTIFY_BUS_CHECK event. The
> complete log is attached (dmesg_march4th_log.txt).
> 
> [22998.536705] ACPI: \_SB_.PCI0.RP01: ACPI: ACPI_NOTIFY_BUS_CHECK event
> [22998.536753] ACPI: \_SB_.PCI0.RP01: ACPI: OSL: Scheduling hotplug
> event 0 for deferred handling
> [22998.536934] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bridge acquired in
> hotplug_event()
> [22998.536972] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [22998.537002] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Checking bridge in
> hotplug_event()
> [22998.537024] PCI READ: res=0, bus=01 dev=00 func=0 pos=0x00 len=4
> data=0x55551556
> [22998.537066] PCI READ: res=0, bus=01 dev=00 func=0 pos=0x00 len=4
> data=0x55551556

Fine again.

> [22998.537094] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Enabling slot in
> acpiphp_check_bridge()
> [22998.537155] ACPI: Device [PXSX] status [0000000f]
> [22998.537206] ACPI: Device [D015] status [0000000f]
> [22998.537276] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Releasing bridge
> in hotplug_event()
> 
> sudo lspci -xxx -s 01:00.0 | grep 10:
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Obviously a problem.  Can you start including the whole
"lspci -x -s 01:00.0" output?  Obviously the Vendor ID reads above
worked fine.  I *assume* it's still fine here, and only the BARs are
zeroed out?

I assume you saw no new dmesg logs about config accesses to the device
before the lspci.  If you instrumented the user config accessors
(pci_user_read_config_*(), also in access.c), you should see those
accesses.

You could sprinkle some calls to early_dump_pci_device() through the
acpiphp path.  Turn off the kernel config access tracing when you do
this so it doesn't clutter things up.

What is this device?  Is it a shipping product?  Do you have good
confidence that the hardware is working correctly?  I guess you said
it works correctly on a different machine with an older kernel.  I
would swap the cards between machines in case one card is broken.

You could try bisecting between the working kernel and the broken one.
It's kind of painful since it takes so long to reproduce the problem.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ