lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <fb359cf0-80fe-1a99-406e-aee8362b4138@huawei.com>
Date:   Tue, 22 Aug 2017 15:45:56 +0800
From:   Kefeng Wang <wangkefeng.wang@...wei.com>
To:     Bjorn Helgaas <bhelgaas@...gle.com>
CC:     Jon Derrick <jonathan.derrick@...el.com>,
        Keith Busch <keith.busch@...el.com>,
        Gabriele Paoloni <gabriele.paoloni@...wei.com>,
        "Guohanjun (Hanjun Guo)" <guohanjun@...wei.com>,
        <wangkefeng.wang@...wei.com>, <linux-pci@...r.kernel.org>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>
Subject: [Question] Race between pcie hot plug and pcie aer report?

Hi Bjorn and all,

   Is there some mechanism to prevent the race between pcie hot plug and pcie aer report,
I am unfamiliar with them, so correct me if I am wrong. We met a Null pointer dereference,
when inject a uncorrect error[UNCOR_STATUS RX_OVER] to a mlx , see the detail in attachment.

aer_isr
 -do_recovery
  -broadcast_error_message
   -pci_walk_bus(dev->bus, cb, &result_data); // bus  is  NULL
and there are some another issues before this error, eg,

[26924.661928] pci \xffffff90QW\xffffffb9: broadcast slot_reset message
 ...
[26926.455484]  (null): broadcast resume message


After check the log, it will also will trigger pcie hot plug when aer report is processing.

pciehp_power_thread
 - case DISABLE_REQ: pciehp_disable_slot(p_slot);
	-pciehp_unconfigure_device
	  -pci_stop_and_remove_bus_device
           -pci_stop_bus_device
            --pci_device_remove
              --remove_one[mlx5_core]
 - case ENABLE_REQ:  pciehp_enable_slot(p_slot);
         -pciehp_configure_device
          -pci_bus_add_devices
           -pci_bus_add_device
	    --pci_device_probe
             --local_pci_probe
	       --init_one[mlx5_core]


So I think the question is, the pcie hot plug will release the pcie_dev, but the aer process will use it, then
it leads to the Null pointer dereference and some other errors.

Any thought?

Thanks,
Kefeng






View attachment "pciehp_aer_race.txt" of type "text/plain" (14724 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ