lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <805b2e50-00f7-40a4-975c-6bfe7d3c431b@huawei.com>
Date: Mon, 31 Mar 2025 19:43:55 +0800
From: "liwei (JK)" <liwei728@...wei.com>
To: <bhelgaas@...gle.com>
CC: "liwei (JK)" <liwei728@...wei.com>, Xiongfeng Wang
	<wangxiongfeng2@...wei.com>, <libang.li@...group.com>,
	<bobo.shaobowang@...wei.com>, <weiliang.qwl@...group.com>,
	<zhaochuanfeng@...wei.com>, <linux-pci@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: [Question] pcie_do_recovery and pci_enable_sriov deadlock problem

Hi, Bjorn

I have encountered a PCI-related deadlock issue triggered by a NONFATAL
AER event during the kdump kernel boot process. However, I have not yet
devised a suitable fix for this problem and would appreciate your
guidance in resolving it. Could you please assist me with this?

The deadlock description is as follows:
When a device is added to the delay_probe_pending_list, the
pci_enable_sriov function is called in the probe interface of struct
pci_driver, if the device triggers an AER NONFATAL event and this
process occurs during the kdump boot sequence, a deadlock will arise.

       The deferred_probe_work side is:

       deferred_probe_work_func
         ...
         __device_attach
           device_lock                         # hold the device_lock
             ...
             pci_enable_sriov
               sriov_enable
                 ...
                 pci_device_add
                   down_write(&pci_bus_sem)    # wait for the pci_bus_sem

       The AER side is:

       pcie_do_recovery
         pci_walk_bus
           down_read(&pci_bus_sem)           # hold the pci_bus_sem
             report_normal_detected
               device_lock                   # wait for device_unlock()


This issue was reported by Jay Fang <f.fangjian@...wei.com> in 2019.
Reference link: 
https://lore.kernel.org/linux-pci/bdfaaa34-3d3d-ad9a-4e24-4be97e85d216@huawei.com/T/#mcb7dfafd0f76beaddfc9f56a71aee6d984ed4a7f

Thanks,
Xiangwei Li

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ