linux-kernel - Re: PCI: hotplug_event: PCIe PLDA Device BAR Reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAMciSVVZ6PMKsU=n8UmQi45ghe7KkzxhwJD0L6Yg9J4Yn9TnQQ@mail.gmail.com>
Date: Wed, 2 Apr 2025 10:54:52 +0530
From: Naveen Kumar P <naveenkumar.parna@...il.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, 
	kernelnewbies <kernelnewbies@...nelnewbies.org>, linux-acpi@...r.kernel.org
Subject: Re: PCI: hotplug_event: PCIe PLDA Device BAR Reset

On Thu, Mar 20, 2025 at 3:11 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Wed, Mar 19, 2025 at 08:07:55PM +0530, Naveen Kumar P wrote:
> > ...
> > I am reaching out to follow up on the PCI BAR0 reset issue and its
> > potential connection to the ACPI errors observed in my system running
> > Linux kernel 6.13.0+.
> > ...
>
> Trying to finish up the last bits for the upcoming v6.15 merge window,
> will come back to this later.
I hope you're doing well. I understand you're busy with the v6.15
merge window, and I appreciate your time. When you get a chance, I’d
love to get your thoughts on this issue.

Since my last message, I have done some additional debugging by adding
debug prints in pci_conf1_read(), __pci_read_base(),
pciehp_configure_device(), and
pci_assign_unassigned_bridge_resources(). After building the kernel
with these changes, I monitored the system for four days.

Upon checking dmesg, this time no ACPI errors were observed. However,
when running lspci, I noticed the following behavior:

01:00.0 RAM memory: PLDA Device 5555 (rev ff)

[386370.935294] USER PCI READ: ret=0, bus=01 dev=00 func=0 pos=0x00
len=4 data=0xffffffff
...
[386370.944394] USER PCI READ: ret=0, bus=01 dev=00 func=0 pos=0x3c
len=4 data=0xffffffff
[386371.048901] ACPI: \_SB_.PCI0.RP01: ACPI: ACPI_NOTIFY_BUS_CHECK event
...
[386371.049944] KERNEL PCI READ: res=0, bus=01 dev=00 func=0 pos=0x00
len=4 data=0x55551556
[386371.049971] KERNEL PCI READ: res=0, bus=01 dev=00 func=0 pos=0x04
len=4 data=0x100000
[386371.049995] KERNEL PCI READ: res=0, bus=01 dev=00 func=0 pos=0x08
len=4 data=0x5000000
[386371.050018] KERNEL PCI READ: res=0, bus=01 dev=00 func=0 pos=0x0c
len=4 data=0x0
[386371.050040] KERNEL PCI READ: res=0, bus=01 dev=00 func=0 pos=0x10
len=4 data=0x0

Initially, lspci triggered pci_user_read_config_dword(), returning
0xffffffff for device 01:00.0.

Shortly after, the ACPI_NOTIFY_BUS_CHECK event was triggered.

Following this, pci_bus_read_config_dword() was called, and the PCI
config space recovered to normal values—except for BAR0, which
remained reset to zero.

Despite adding debug prints, none of them appeared in dmesg, which
suggests that none of the instrumented functions were involved in this
recovery process.

The key question now is:
Why does the PCIe device 01:00.0 silently return 0xffffffff for config
space reads? If the device is powered down or reset, can we trace the
related event in the kernel code? But
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status is showing as
active only.

After the ACPI_NOTIFY_BUS_CHECK event, how is the config space (except
BAR0) recovered?

Which function is responsible for handling this recovery?

Any insights or pointers would be greatly appreciated.

Looking forward to your thoughts.

Thanks again for your time!

>
> Bjorn

View attachment "dmesg_april2nd_log.txt" of type "text/plain" (158749 bytes)

Download attachment "0006-added-debug-prints-in-pci_conf1_read-and-__pci_read_.patch" of type "application/octet-stream" (3682 bytes)