linux-kernel - [PATCH] vfio/pci: Skip hot reset on Link-Down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <20251208074459.1297-1-guojinhui.liam@bytedance.com>
Date: Mon,  8 Dec 2025 15:44:59 +0800
From: "Jinhui Guo" <guojinhui.liam@...edance.com>
To: <alex@...zbot.org>
Cc: <guojinhui.liam@...edance.com>, <kvm@...r.kernel.org>, 
	<linux-kernel@...r.kernel.org>
Subject: [PATCH] vfio/pci: Skip hot reset on Link-Down

On hot-pluggable ports, simultaneous surprise removal of multiple
PCIe endpoints whether by pulling the card, powering it off, or
dropping the link can trigger a system deadlock.

Example: two PCIe endpoints are bound to vfio-pci and opened by
the same process (fdA for device A, fdB for device B).

1. A PCIe-fault brings B’s link down, then A’s.
2. The PCI core starts removing B:
   - pciehp_unconfigure_device() takes pci_rescan_remove_lock
   - vfio-pci’s remove routine waits for fdB to be closed
3. While B is stuck, the core removes A:
   - pciehp_ist() takes the read side of reset_lock A
   - It blocks on pci_rescan_remove_lock already held by B
4. Killing the process closes fdA first.
   vfio_pci_core_close_device() tries to hot-reset A, so it needs
   the write side of reset_lock A.
5. The write request sleeps until the read lock from step 3 is
   released, but that reader is itself waiting for B’s lock
   -> deadlock.

The stuck thread’s backtrace is as follows:
  /proc/1909/stack
    [<0>] vfio_unregister_group_dev+0x99/0xf0 [vfio]
    [<0>] vfio_pci_core_unregister_device+0x19/0xb0 [vfio_pci_core]
    [<0>] vfio_pci_remove+0x15/0x20 [vfio_pci]
    [<0>] pci_device_remove+0x3e/0xb0
    [<0>] device_release_driver_internal+0x19b/0x200
    [<0>] pci_stop_bus_device+0x6d/0x90
    [<0>] pci_stop_and_remove_bus_device+0xe/0x20
    [<0>] pciehp_unconfigure_device+0x8c/0x150
    [<0>] pciehp_disable_slot+0x68/0x140
    [<0>] pciehp_handle_presence_or_link_change+0x246/0x4c0
    [<0>] pciehp_ist+0x244/0x280
    [<0>] irq_thread_fn+0x1f/0x60
    [<0>] irq_thread+0x1ac/0x290
    [<0>] kthread+0xfa/0x240
    [<0>] ret_from_fork+0x209/0x260
    [<0>] ret_from_fork_asm+0x1a/0x30
  /proc/1910/stack
    [<0>] pciehp_unconfigure_device+0x43/0x150
    [<0>] pciehp_disable_slot+0x68/0x140
    [<0>] pciehp_handle_presence_or_link_change+0x246/0x4c0
    [<0>] pciehp_ist+0x244/0x280
    [<0>] irq_thread_fn+0x1f/0x60
    [<0>] irq_thread+0x1ac/0x290
    [<0>] kthread+0xfa/0x240
    [<0>] ret_from_fork+0x209/0x260
    [<0>] ret_from_fork_asm+0x1a/0x30
  /proc/6765/stack
    [<0>] pciehp_reset_slot+0x2c/0x70
    [<0>] pci_reset_hotplug_slot+0x3e/0x60
    [<0>] pci_reset_bus_function+0xcd/0x180
    [<0>] cxl_reset_bus_function+0xc8/0x110
    [<0>] __pci_reset_function_locked+0x4f/0xd0
    [<0>] vfio_pci_core_disable+0x381/0x400 [vfio_pci_core]
    [<0>] vfio_pci_core_close_device+0x63/0xd0 [vfio_pci_core]
    [<0>] vfio_df_close+0x48/0x80 [vfio]
    [<0>] vfio_df_group_close+0x32/0x70 [vfio]
    [<0>] vfio_device_fops_release+0x1d/0x40 [vfio]
    [<0>] __fput+0xe6/0x2b0
    [<0>] task_work_run+0x58/0x90
    [<0>] do_exit+0x29b/0xa80
    [<0>] do_group_exit+0x2c/0x80
    [<0>] get_signal+0x8f9/0x900
    [<0>] arch_do_signal_or_restart+0x29/0x210
    [<0>] exit_to_user_mode_loop+0x8e/0x4f0
    [<0>] do_syscall_64+0x262/0x630
    [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Since the device is already disconnected, a hot-reset serves no
purpose and risks generating additional PCIe link errors during
the unplug sequence. Fix the issue by skipping hot-reset on Link-Down.

Signed-off-by: Jinhui Guo <guojinhui.liam@...edance.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3a11e6f450f7..f42051552dd4 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -678,6 +678,16 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 		if (!vdev->reset_works)
 			goto out;
 
+		/*
+		 * Skip hot reset on Link-Down. This avoids the reset_lock
+		 * deadlock in pciehp_reset_slot() when multiple PCIe devices
+		 * go down at the same time.
+		 */
+		if (pci_dev_is_disconnected(pdev)) {
+			vdev->needs_reset = false;
+			goto out;
+		}
+
 		pci_save_state(pdev);
 	}
 
-- 
2.20.1