lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe8a89b501e44737821fe8b0ab4492e9@huawei.com>
Date: Wed, 14 Jan 2026 09:49:44 +0000
From: Kangfenglong <kangfenglong@...wei.com>
To: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "shenyang
 (M)" <shenyang39@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>,
	"shenjian (K)" <shenjian15@...wei.com>, "Wangyu (Eric)"
	<seven.wangyu@...wei.com>
Subject: [BUG] PCI/DPC: NULL pointer dereference in pci_bus_read_config_dword
 during DPC recovery racing with hotplug

Hello Linux PCI maintainers,

I'm reporting a critical use-after-free bug in the PCIe DPC recovery
mechanism that occurs when DPC recovery races with hotplug removal.
This results in a NULL pointer dereference in pci_bus_read_config_dword
and system panic.

## 1. Bug Summary

Title: Use-after-free race between DPC recovery and hotplug removal
Root Cause: DPC recovery accesses pci_dev after hotplug freed it
Severity: High (kernel panic, system crash)
Kernel Version: 5.10.0
Architecture: ARM64 (aarch64)

## 2. Timeline Analysis

### 2.1 DPC Event (T=1411s)
[ 1411.421320][ T2808] pcieport 0000:5f:00.0: DPC: containment event
[ 1411.463810][ T2808] pcieport 0000:5f:00.0: PCIe Bus Error: severity=Uncorrected

DPC recovery starts on port 0000:5f:00.0 (thread T2808)

### 2.2 Hotplug Link Down (T=1415s)
[ 1415.621146][ T2807] pcieport 0000:5f:00.0: pciehp: Slot(3): Link Down

Hotplug detects link down on same slot (thread T2807).
RACE BEGINS: DPC and hotplug run concurrently.

### 2.3 Device Recovery Fails (T=1415s-1479s)
[ 1415.650432][ T8030] mlx5_core 0000:60:00.1: mlx5_error_sw_reset: start
[ 1479.604410][ T8030] mlx5_health_try_recover: health recovery flow aborted

mlx5 driver recovery fails.

### 2.4 Hotplug Removal Starts (T=1479s)
[ 1479.962953][ T2808] pci 0000:60:00.1: AER: can't recover
[ 1479.962961][ T2807] pci 0000:60:00.1: Removing from iommu group 67

Hotplug begins removing device 0000:60:00.1.

### 2.5 DPC Recovery Continues (T=1481s)
[ 1481.412327][ T2808] pci 0000:60:00.0: not ready 1023ms after DPC

DPC recovery still active for sibling device 0000:60:00.0.

### 2.6 Hotplug Re-enumeration (T=1481s)
[ 1481.892474][ T2807] pciehp: Slot(3): Card present
[ 1481.899188][ T2807] pciehp: Slot(3): Link Up

Hotplug re-detects and re-enumerates device.

### 2.7 The Crash (T=1484s)
[ 1484.742859][ T2808] Unable to handle kernel NULL pointer dereference
[ 1485.009989][ T2808] pc : pci_bus_read_config_dword+0x1b0/0x350

## 3. Call Trace
[ 1484.742859][ T2808] Unable to handle kernel NULL pointer dereference
[ 1484.814910][ T2808] Internal error: Oops: 0000000096000004 [#1] SMP
[ 1485.009989][ T2808] pc : pci_bus_read_config_dword+0x1b0/0x350
[ 1485.015808][ T2808] lr : pci_read_config_dword+0x4c/0xa0
[ 1485.120656][ T2808]  pci_bus_read_config_dword+0x1b0/0x350
[ 1485.126130][ T2808]  pci_read_config_dword+0x4c/0xa0
[ 1485.131084][ T2808]  pci_dev_wait+0xf0/0x230
[ 1485.135341][ T2808]  pci_bridge_wait_for_secondary_bus+0x204/0x4f0
[ 1485.141509][ T2808]  dpc_reset_link+0x12c/0x534
[ 1485.146027][ T2808]  pcie_do_recovery+0x26c/0x6e0
[ 1485.150718][ T2808]  dpc_handler+0x90/0x170
[ 1485.154890][ T2808]  irq_thread_fn+0x50/0x180
[ 1485.159234][ T2808]  irq_thread+0x144/0x210
[ 1485.163407][ T2808]  kthread+0x190/0x210
[ 1485.167318][ T2808]  ret_from_fork+0x10/0x18

## 4. Root Cause
I speculate that an use-after-free race condition cause this issue:

Thread T2808 (DPC):
dpc_handler() -> pcie_do_recovery() -> dpc_reset_link() ->
pci_bridge_wait_for_secondary_bus() -> pci_dev_wait() ->
pci_read_config_dword()

Thread T2807 (hotplug):
Detects link down -> removes device -> frees pci_dev ->
re-enumerates device -> creates new pci_dev

The DPC recovery assumes device stability, but hotplug can free and
recreate pci_dev concurrently.

## 5. System Information

Hardware: HiSilicon ARM64 server
CPU: ARM64, core 11 affected
NIC: Mellanox ConnectX-6 (0000:60:00.0/1)
Upstream Port: 0000:5f:00.0

Kernel: 5.10.0
Modules: mlx5_core,
Config: CONFIG_HOTPLUG_PCI_PCIE=y, CONFIG_PCIE_DPC=y

## 6. Questions

1. Existing synchronization between DPC and hotplug?
2. Should DPC use pci_dev_get()/pci_dev_put()?
3. Similar fixes in newer kernels?
4. Preferred approach?

Thank you.

Best regards


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ