lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231012164336.GA1072823@bhelgaas>
Date:   Thu, 12 Oct 2023 11:43:36 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Siddharth Vadapalli <s-vadapalli@...com>
Cc:     lpieralisi@...nel.org, robh@...nel.org, kw@...ux.com,
        bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        r-gunasekaran@...com, srk@...com
Subject: Re: [PATCH] PCI: keystone: Don't enable BAR0 if link is not detected

On Thu, Oct 12, 2023 at 10:15:09AM +0530, Siddharth Vadapalli wrote:
> Hello Bjorn,
> 
> Thank you for reviewing the patch.
> 
> On 11/10/23 19:16, Bjorn Helgaas wrote:
> > Hi Siddharth,
> > 
> > On Wed, Oct 11, 2023 at 06:04:51PM +0530, Siddharth Vadapalli wrote:
> >> Since the function dw_pcie_host_init() ignores the absence of link under
> >> the assumption that link can come up later, it is possible that the
> >> pci_host_probe(bridge) function is invoked even when no endpoint device
> >> is connected. In such a situation, the ks_pcie_v3_65_add_bus() function
> >> configures BAR0 when the link is not up, resulting in Completion Timeouts
> >> during the MSI configuration performed later by the PCI Express Port driver
> >> to setup AER, PME and other services. Thus, leave BAR0 disabled if link is
> >> not yet detected when the ks_pcie_v3_65_add_bus() function is invoked.
> > 
> > I'm trying to make sense of this.  In this path:
> > 
> >   pci_host_probe
> >     pci_scan_root_bus_bridge
> >       pci_register_host_bridge
> > 	bus = pci_alloc_bus(NULL)    # root bus
> > 	bus->ops->add_bus(bus)
> > 	  ks_pcie_v3_65_add_bus
> > 
> > The BAR0 in question must belong to a Root Port.  And it sounds like
> > the issue must be related to MSI-X, since the original MSI doesn't
> > involve any BARs.
> 
> Yes, the issue is related to MSI-X. I will list down the exact set of function
> calls below as well as the place where the completion timeout first occurs:
> ks_pcie_probe
>   dw_pcie_host_init
>     pci_host_probe
>       pci_bus_add_devices
>         pci_bus_add_device
>           device_attach
>             __device_attach
>               bus_for_each_drv
>                 __device_attach_driver (invoked using fn(drv, data))
>                   driver_probe_device
>                     __driver_probe_device
>                       really_probe
>                         pci_device_probe
>                           pcie_portdrv_probe
>                             pcie_port_device_register
>                               pcie_init_service_irqs
>                                 pcie_port_enable_irq_vec
>                                   pci_alloc_irq_vectors
>                                     pci_alloc_irq_vectors_affinity
>                                       __pci_enable_msix_range
>                                         msix_capability_init
>                                           msix_setup_interrupts
>                                             msix_setup_msi_descs
>                                               msix_prepare_msi_desc
> In this function: msix_prepare_msi_desc, the following readl()
> causes completion timeout:
> 		desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> The completion timeout with the readl is only observed when the link
> is down (No Endpoint device is actually connected to the PCIe
> connector slot).

Do you know the address ("addr")?  From pci_msix_desc_addr(), it looks
like it should be:

  desc->pci.mask_base + desc->msi_index * PCI_MSIX_ENTRY_SIZE

and desc->pci.mask_base should be dev->msix_base, which we got from
msix_map_region(), which ioremaps part of the BAR indicated by the
MSI-X Table Offset/Table BIR register.

I wonder if this readl() is being handled as an MMIO access to a
downstream device instead of a Root Port BAR access because it's
inside the Root Port's MMIO window.

Could you dump out these values just before the readl()?

  phys_addr inside msix_map_region()
  dev->msix_base
  desc->pci.mask_base
  desc->msi_index
  addr
  call early_dump_pci_device() on the Root Port

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ