linux-kernel - Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL_JsqKQxFvkFtph1BZD2LKdZjboxhMTWkZe_AWS-vMD9y0pMw@mail.gmail.com>
Date:   Fri, 11 Dec 2020 08:37:40 -0600
From:   Rob Herring <robh@...nel.org>
To:     Daniel Thompson <daniel.thompson@...aro.org>
Cc:     Minghuan Lian <minghuan.Lian@....com>,
        Mingkai Hu <mingkai.hu@....com>, Roy Zang <roy.zang@....com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Jon Nettleton <jon@...id-run.com>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        PCI <linux-pci@...r.kernel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linaro Patches <patches@...aro.org>
Subject: Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration
 problems with Honeycomb LX2K

On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
<daniel.thompson@...aro.org> wrote:
>
> I have been chasing down a problem enumerating an NVMe drive on a
> Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
> successfully if the we are emitting lots of console messages via a UART.
> If the system is booted with `quiet` set then enumeration fails.

We really need to get away from work-arounds for device X on host Y. I
suspect they are not limited to that combination only...

How exactly does it fail to enumerate? There's a (racy) linkup check
on config accesses, is it reporting link down and skipping config
accesses?

> I guessed this would be due to the timing impact of printk-to-UART and
> tried to find out where a delay could be added to provoke a successful
> enumeration.
>
> This patch contains the results. The delay length (1ms) was selected by
> binary searching downwards until the delay was not effective for these
> devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
>
> I have also included the workaround twice (conditionally compiled). The
> first change is the *latest* possible code path that we can deploy a
> delay whilst the second is the earliest place I could find.
>
> The summary is that the critical window were we are currently relying on
> a call to the console UART code can "mend" the driver runs from calling
> dw_pcie_setup_rc() in host init to just before we read the state in the
> link up callback.
>
> Signed-off-by: Daniel Thompson <daniel.thompson@...aro.org>
> ---
>
> Notes:
>     This patch is RFC (and HACK) because I don't have much clue *why* this
>     patch works... merely that this is the smallest possible change I need
>     to replicate the current accidental printk() workaround.  Perhaps one
>     could argue that RFC here stands for request-for-clue.  All my
>     observations and changes here are empirical and I don't know how best to
>     turn them into something that is not a hack!
>
>     BTW I noticed many other pcie-designware drivers take advantage
>     of a function called dw_pcie_wait_for_link() in their init paths...
>     but my naive attempts to add it to the layerscape driver results
>     in non-booting systems so I haven't embarrassed myself by including
>     that in the patch!

You need to look at what's pending for v5.11, because I reworked this
to be more unified. The ordering of init is also possibly changed. The
sequence is now like this:

        dw_pcie_setup_rc(pp);
        dw_pcie_msi_init(pp);

        if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
                ret = pci->ops->start_link(pci);
                if (ret)
                        goto err_free_msi;
        }

        /* Ignore errors, the link may come up later */
        dw_pcie_wait_for_link(pci);


>  drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++
>  1 file changed, 35 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c
> index f24f79a70d9a..c354904b90ef 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -22,6 +22,8 @@
>
>  #include "pcie-designware.h"
>
> +#define WORKAROUND_LATEST_POSSIBLE
> +
>  /* PEX1/2 Misc Ports Status Register */
>  #define SCFG_PEXMSCPORTSR(pex_idx)     (0x94 + (pex_idx) * 4)
>  #define LTSSM_STATE_SHIFT      20
> @@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
>         struct ls_pcie *pcie = to_ls_pcie(pci);
>         u32 state;
>
> +       /*
> +        * Strictly speaking *this* (before the ioread32) is the latest
> +        * point a simple delay can be effective. If we move the delay
> +        * after the ioread32 then the NVMe does not enumerate.
> +        *
> +        * However this function appears to be frequently called so an
> +        * unconditional delay here causes noticeable delay at boot
> +        * time. Hence we implement the workaround by retrying the read
> +        * after a short delay if we think we might need to return false.
> +        */
> +
>         state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
>                  pcie->drvdata->ltssm_shift) &
>                  LTSSM_STATE_MASK;
>
> +#ifdef WORKAROUND_LATEST_POSSIBLE
> +       if (state < LTSSM_PCIE_L0) {
> +               /* see comment above */
> +               mdelay(1);
> +               state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
> +                        pcie->drvdata->ltssm_shift) &
> +                        LTSSM_STATE_MASK;

Side note, I really wonder about the variety of ways in DWC drivers we
have to check link state. The default is a debug register... Are the
standard PCIe registers for this really broken in these cases?

> +       }
> +#endif
> +
>         if (state < LTSSM_PCIE_L0)
>                 return 0;
>
> @@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp)
>
>         dw_pcie_setup_rc(pp);

Might be related to the PORT_LOGIC_SPEED_CHANGE we do in here.

> +#ifdef WORKAROUND_EARLIEST_POSSIBLE
> +       /*
> +        * This is the earliest point the delay is effective.
> +        * If we move it before dw_pcie_setup_rc() then the
> +        * NVMe does not enumerate.
> +        *
> +        * 500us is too short to reliably work around the issue
> +        * hence adopting 1000us here.
> +        */
> +       mdelay(1);
> +#endif
> +
>         return 0;
>  }
>
>
> base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f
> --
> 2.29.2
>