[<prev] [next>] [day] [month] [year] [list]
Message-ID: <fb359cf0-80fe-1a99-406e-aee8362b4138@huawei.com>
Date: Tue, 22 Aug 2017 15:45:56 +0800
From: Kefeng Wang <wangkefeng.wang@...wei.com>
To: Bjorn Helgaas <bhelgaas@...gle.com>
CC: Jon Derrick <jonathan.derrick@...el.com>,
Keith Busch <keith.busch@...el.com>,
Gabriele Paoloni <gabriele.paoloni@...wei.com>,
"Guohanjun (Hanjun Guo)" <guohanjun@...wei.com>,
<wangkefeng.wang@...wei.com>, <linux-pci@...r.kernel.org>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>
Subject: [Question] Race between pcie hot plug and pcie aer report?
Hi Bjorn and all,
Is there some mechanism to prevent the race between pcie hot plug and pcie aer report,
I am unfamiliar with them, so correct me if I am wrong. We met a Null pointer dereference,
when inject a uncorrect error[UNCOR_STATUS RX_OVER] to a mlx , see the detail in attachment.
aer_isr
-do_recovery
-broadcast_error_message
-pci_walk_bus(dev->bus, cb, &result_data); // bus is NULL
and there are some another issues before this error, eg,
[26924.661928] pci \xffffff90QW\xffffffb9: broadcast slot_reset message
...
[26926.455484] (null): broadcast resume message
After check the log, it will also will trigger pcie hot plug when aer report is processing.
pciehp_power_thread
- case DISABLE_REQ: pciehp_disable_slot(p_slot);
-pciehp_unconfigure_device
-pci_stop_and_remove_bus_device
-pci_stop_bus_device
--pci_device_remove
--remove_one[mlx5_core]
- case ENABLE_REQ: pciehp_enable_slot(p_slot);
-pciehp_configure_device
-pci_bus_add_devices
-pci_bus_add_device
--pci_device_probe
--local_pci_probe
--init_one[mlx5_core]
So I think the question is, the pcie hot plug will release the pcie_dev, but the aer process will use it, then
it leads to the Null pointer dereference and some other errors.
Any thought?
Thanks,
Kefeng
View attachment "pciehp_aer_race.txt" of type "text/plain" (14724 bytes)
Powered by blists - more mailing lists