From: Yinghai Lu Subject: [PATCH 01/10] PCI: Make sriov work with hotplug remove When hot remove pci express module that have pcie switch and support SRIOV, got [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1 [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3) [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9 [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press. [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10 [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0 [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00 [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9 [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00 [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6 [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14 [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15 [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16 [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null) [ 5926.127189] IP: [] pci_stop_bus_device+0x33/0x83 [ 5926.133377] PGD 0 [ 5926.135402] Oops: 0000 [#1] SMP [ 5926.138659] CPU 2 [ 5926.140499] Modules linked in: ... [ 5926.143754] [ 5926.275823] Call Trace: [ 5926.278267] [] pci_stop_bus_device+0x30/0x83 [ 5926.284180] [] pci_remove_bus_device+0x1a/0xba [ 5926.290264] [] pciehp_unconfigure_device+0x110/0x17b [ 5926.296866] [] ? pciehp_disable_slot+0x188/0x188 [ 5926.303123] [] pciehp_disable_slot+0x11e/0x188 [ 5926.309206] [] pciehp_power_thread+0x8f/0xe0 ... +-[0000:80]-+-00.0-[81-8f]-- | +-01.0-[90-9f]-- | +-02.0-[a0-af]-- | +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device | | | +-00.1 Device | | | +-00.2 Device | | | \-00.3 Device | | \-03.0-[b3]--+-00.0 Device | | +-00.1 Device | | +-00.2 Device | | \-00.3 Device root complex: 80:02.2 pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0. end devices are b2:00.0 and b3.00.0. VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3 Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and remove the fn, so list_for_each_safe(l, n, &bus->devices) will have problem to refer freed n that is pointed to vf entry. Solution is just call pci_stop_bus_device() with phys fn only. and before that need to save phys fn aside and avoid to use bus->devices to loop. During reviewing the patch, Bjorn said: | The PCI hot-remove path calls pci_stop_bus_devices() via | pci_remove_bus_device(). | | pci_stop_bus_devices() traverses the bus->devices list (point A below), | stopping each device in turn, which calls the driver remove() method. When | the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which | also uses pci_remove_bus_device() to remove the VF devices from the | bus->devices list (point B). | | pci_remove_bus_device | pci_stop_bus_device | pci_stop_bus_devices(subordinate) | list_for_each(bus->devices) <-- A | pci_stop_bus_device(PF) | ... | driver->remove | pci_disable_sriov | ... | pci_remove_bus_device(VF) | <-- B | | At B, we're changing the same list we're iterating through at A, so when | the driver remove() method returns, the pci_stop_bus_devices() iterator has | a pointer to a list entry that has already been freed. | | This patch avoids the problem by building a separate list of all PFs on | the bus and traversing that at A instead of the bus->devices list. Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141 Signed-off-by: Yinghai Lu --- drivers/pci/remove.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) Index: linux-2.6/drivers/pci/remove.c =================================================================== --- linux-2.6.orig/drivers/pci/remove.c +++ linux-2.6/drivers/pci/remove.c @@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci pci_remove_bus_device(pci_dev_b(l)); } +struct dev_list { + struct pci_dev *dev; + struct list_head list; +}; + static void pci_stop_bus_devices(struct pci_bus *bus) { struct list_head *l, *n; + struct dev_list *dl, *dn; + LIST_HEAD(physfn_list); + + /* Save phys_fn aside at first */ + list_for_each(l, &bus->devices) { + struct pci_dev *dev = pci_dev_b(l); + + if (!dev->is_virtfn) { + dl = kmalloc(sizeof(*dl), GFP_KERNEL); + if (!dl) + continue; + dl->dev = dev; + list_add_tail(&dl->list, &physfn_list); + } + } + + /* + * stop bus device for phys_fn at first + * it will stop and remove vf in driver remove action + */ + list_for_each_entry_safe(dl, dn, &physfn_list, list) { + struct pci_dev *dev = dl->dev; + + pci_stop_bus_device(dev); + + kfree(dl); + } + /* Do it again for left over in case */ list_for_each_safe(l, n, &bus->devices) { struct pci_dev *dev = pci_dev_b(l); pci_stop_bus_device(dev);