lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181009175632.GB5906@bhelgaas-glaptop.roam.corp.google.com>
Date:   Tue, 9 Oct 2018 12:56:32 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Jon Derrick <jonathan.derrick@...el.com>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Keith Busch <keith.busch@...el.com>,
        Sinan Kaya <okaya@...nel.org>,
        Oza Pawandeep <poza@...eaurora.org>,
        Matthew Wilcox <willy@...radead.org>,
        Lukas Wunner <lukas@...ner.de>, Christoph Hellwig <hch@....de>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH] PCI/portdrv: Enable error reporting on managed ports

On Tue, Sep 04, 2018 at 12:33:09PM -0600, Jon Derrick wrote:
> During probe, the port driver will disable error reporting and assumes
> it will be enabled later by the AER driver's pci_walk_bus() sequence.
> This may not be the case for host-bridge enabled root ports, who will
> enable first error reporting on the bus during the root port probe, and
> then disable error reporting on downstream devices during subsequent
> probing of the bus.

I understand the hotplug case (see below), but help me understand this
"host-bridge enabled root ports" thing.  I'm not sure what that means.

We run pcie_portdrv_probe() for every root port, switch upstream port,
and switch downstream port, and it always disables error reporting for
the port:

  pcie_portdrv_probe          # pci_driver .probe
    pcie_port_device_register
      get_port_device_capability
        services |= PCIE_PORT_SERVICE_AER
        pci_disable_pcie_error_reporting
          # clear DEVCTL Error Reporting Enables

For root ports, we call aer_probe(), and it enables error reporting
for the entire tree below the root port:

  aer_probe                   # pcie_port_service .probe
    aer_enable_rootport
      set_downstream_devices_error_reporting(dev, true)
        pci_walk_bus(dev->subordinate, set_device_error_reporting)
          set_device_error_reporting
            if (Root Port || Upstream Port || Downstream Port)
              pci_enable_pcie_error_reporting
                # set DEVCTL Error Reporting Enables

This is definitely broken for hot-added switches because aer_probe()
is the only place we enable error reporting, and it's only run when we
enumerate a root port, not when we hot-add things below that root
port.

> A hotplugged port device may also fail to enable error reporting as the
> AER driver has already run on the root bus.

> Check for these conditions and enable error reporting during portdrv
> probing.
> 
> Example case:

pcie_portdrv_probe(10000:00:00.0):
> [  343.790573] pcieport 10000:00:00.0: pci_disable_pcie_error_reporting

aer_probe(10000:00:00.0):
> [  343.809812] pcieport 10000:00:00.0: pci_enable_pcie_error_reporting
> [  343.819506] pci 10000:01:00.0: pci_enable_pcie_error_reporting
> [  343.828814] pci 10000:02:00.0: pci_enable_pcie_error_reporting
> [  343.838089] pci 10000:02:01.0: pci_enable_pcie_error_reporting
> [  343.847478] pci 10000:02:02.0: pci_enable_pcie_error_reporting
> [  343.856659] pci 10000:02:03.0: pci_enable_pcie_error_reporting
> [  343.865794] pci 10000:02:04.0: pci_enable_pcie_error_reporting
> [  343.874875] pci 10000:02:05.0: pci_enable_pcie_error_reporting
> [  343.883918] pci 10000:02:06.0: pci_enable_pcie_error_reporting
> [  343.892922] pci 10000:02:07.0: pci_enable_pcie_error_reporting

pcie_portdrv_probe(10000:01:00.0):
> [  343.918900] pcieport 10000:01:00.0: pci_disable_pcie_error_reporting

pcie_portdrv_probe(10000:02:00.0):
> [  343.968426] pcieport 10000:02:00.0: pci_disable_pcie_error_reporting

...
> [  344.028179] pcieport 10000:02:01.0: pci_disable_pcie_error_reporting
> [  344.091269] pcieport 10000:02:02.0: pci_disable_pcie_error_reporting
> [  344.156473] pcieport 10000:02:03.0: pci_disable_pcie_error_reporting
> [  344.238042] pcieport 10000:02:04.0: pci_disable_pcie_error_reporting
> [  344.321864] pcieport 10000:02:05.0: pci_disable_pcie_error_reporting
> [  344.411601] pcieport 10000:02:06.0: pci_disable_pcie_error_reporting
> [  344.505332] pcieport 10000:02:07.0: pci_disable_pcie_error_reporting

> [  344.621824] nvme 10000:06:00.0: pci_enable_pcie_error_reporting
> 
> Signed-off-by: Jon Derrick <jonathan.derrick@...el.com>
> ---
>  drivers/pci/pcie/portdrv_core.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 7c37d81..fdd953a 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -343,6 +343,16 @@ int pcie_port_device_register(struct pci_dev *dev)
>       if (!nr_service)
>               goto error_cleanup_irqs;
>  
> +#ifdef CONFIG_PCIEAER
> +     /*
> +      * Enable error reporting for this port in case AER probing has already
> +      * run on the root bus or this port device is hot-inserted
> +      */
> +     if (dev->aer_cap && pci_aer_available() &&
> +         (pcie_ports_native || pci_find_host_bridge(dev->bus)->native_aer))
> +             pci_enable_pcie_error_reporting(dev);
> +#endif

I plan to apply this after we clarify the changelog a bit, but I don't
really like this patch because it (and the corresponding code added by
2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port
initialization")) seem a little out of place.

The way I think this *should* work is that the PCI core should arrange to
handle AER interrupts when it enumerates the devices that can generate
them (Root Ports and Root Complex Event Collectors), even before it
enumerates the devices below the Root Port.

Then the PCI core could directly enable the AER interrupts on all devices
as it enumerates them.  I would envision both cases being handled somewhere
like pci_aer_init() in pci_init_capabilities().

This would also allow us to get rid of the pci_enable_pcie_error_reporting()
calls that are currently sprinkled around in drivers, because that would be
handled by the core for all devices.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ