lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 21 May 2022 11:43:03 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Jim Quinlan <jim2101024@...il.com>
Cc:     linux-pci@...r.kernel.org,
        Nicolas Saenz Julienne <nsaenz@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>, james.dutton@...il.com,
        kibi@...ian.org, bcm-kernel-feedback-list@...adcom.com,
        james.quinlan@...adcom.com,
        Florian Fainelli <f.fainelli@...il.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        Krzysztof WilczyƄski <kw@...ux.com>,
        "moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" 
        <linux-rpi-kernel@...ts.infradead.org>,
        "moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" 
        <linux-arm-kernel@...ts.infradead.org>,
        linux-kernel@...r.kernel.org,
        "Rafael J. Wysocki" <rjw@...ysocki.net>, linux-pm@...r.kernel.org
Subject: Re: [PATCH v1] PCI: brcmstb: Fix regression regarding missing PCIe
 linkup

[+cc Rafael, linux-pm because I think there are interesting power
management questions here]

On Wed, May 18, 2022 at 03:42:11PM -0400, Jim Quinlan wrote:
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> endpoint node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> 
> We fix this by allowing the DT endpoint subnode to be missing.  This is
> important for platforms like the CM4 which have a standard PCIe socket and
> the endpoint device is unknown.

I think the problem here is that on the CM, we try to enumerate
devices that are not powered up, isn't it?  The commit log should say
something about that power situation and how the driver learns about
the power regulators instead of just pointing at an DT endpoint node.

I guess the intent of this patch is to turn on the power to downstream
devices before enumerating them?  What happens if we turn on the power
but don't find any downstream devices?  From looking at the code, I
assume we just leave the power on.  Maybe that's what you want, I
dunno.

I added Rafael because this seems vaguely similar to runtime power
management, and if we can integrate with that somehow, I'd sure like
to avoid building a parallel infrastructure for it.

The current path we're on is to move some of this code that's
currently in pcie-brcmstb.c to the PCIe portdrv [0].  I'm a little
hesitant about that because ACPI does just fine without it.  If we're
adding new DT functionality that could not be implemented via ACPI,
that's one thing.  But I'm not convinced this is that new.

That's a longer term question.  In the short term we need to fix the
regression.  More specifics about that below.

[0] https://lore.kernel.org/r/20211110221456.11977-6-jim2101024@gmail.com

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> 
> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> Signed-off-by: Jim Quinlan <jim2101024@...il.com>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index ba5c120816b2..adca74e235cb 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -540,16 +540,18 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
>  
>  static int brcm_pcie_add_bus(struct pci_bus *bus)
>  {
> -	struct device *dev = &bus->dev;
>  	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
>  	int ret;
>  
> -	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> +	/* Only busno==1 requires us to linkup */
> +	if ((int)bus->number != 1)
>  		return 0;
>  
>  	ret = pci_subdev_regulators_add_bus(bus);
> -	if (ret)
> +	if (ret) {
> +		pcie->refusal_mode = true;
>  		return ret;
> +	}
>  
>  	/* Grab the regulators for suspend/resume */
>  	pcie->sr = bus->dev.driver_data;

IIUC, this path:

  pci_alloc_child_bus
    brcm_pcie_add_bus                   # .add_bus method
      pci_subdev_regulators_add_bus     # in pcie-brcmstb.c for now
        alloc_subdev_regulators         # in pcie-brcmstb.c for now
        regulator_bulk_get
        regulator_bulk_enable
      brcm_pcie_linkup			# bring link up

is basically so we can leave power to downstream devices off, then
turn it on when we're ready to enumerate those downstream devices.

I think the brcmstb root bus is always bus 0, it only has a single
Root Port on the root bus, and it always leads to bus 1, so it sort of
makes sense that we only need to turn on power when we're about to
scan "bus->number == 1".

But this power management seems like a pattern that other controllers
will use.  Other controllers will have several Root Ports, so checking
the bus number won't work for them.  Instead of checking the bus
number, I think brcmstb should check more directly for a power
regulator.

Tangent 1: I think this means a downstream device goes from D3cold to
D0uninitialized?  Does this code account for the required delays
accesses to the device?  I see some in brcm_pcie_linkup(), but I don't
see anything that looks like Tpvperl (the time PERST# must remain
asserted after power becomes valid) or Tperst (when asserted, PERST#
must remain asserted at least this long) (both from PCIe r6.0, sec
6.6.1).

Tangent 2: "brcm_pcie_link_up()" makes sense -- it's the conventional
name for the simple boolean function that tells us whether the link is
up.  "brcm_pcie_linkup()", which *brings* the link up, is confusing
because it's too similar to "brcm_pcie_link_up()".  The conventional
name for this would be "brcm_pcie_start_link()".

Tangent 3: There are fewer than 20 forward function declarations in
drivers/pci/controller/, and 9 of them are in pcie-brcmstb.c.  It's a
lot easier to maintain all these drivers if they use a common style.
Generally speaking, Linux code orders function definitions to avoid
the need for forward declarations.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ