lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 10 May 2023 15:47:58 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Peter Geis <pgwipeout@...il.com>
Cc:     Vincenzo Palazzo <vincenzopalazzodev@...il.com>,
        linux-pci@...r.kernel.org, robh@...nel.org, heiko@...ech.de,
        kw@...ux.com, shawn.lin@...k-chips.com,
        linux-kernel@...r.kernel.org, lgirdwood@...il.com,
        linux-rockchip@...ts.infradead.org, broonie@...nel.org,
        bhelgaas@...gle.com,
        linux-kernel-mentees@...ts.linuxfoundation.org,
        lpieralisi@...nel.org, linux-arm-kernel@...ts.infradead.org,
        Dan Johansen <strit@...jaro.org>
Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for
 Rockchip PCIe bus scan

On Tue, May 09, 2023 at 08:11:29PM -0400, Peter Geis wrote:
> On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > > Add a configurable delay to the Rockchip PCIe driver to address
> > > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > >
> > > This issue is affecting the ARM community, but there is no
> > > upstream solution for it yet.
> >
> > It sounds like this happens with several endpoints, right?  And I
> > assume the endpoints work fine in other non-Rockchip systems?  If
> > that's the case, my guess is the problem is with the Rockchip host
> > controller and how it's initialized, not with the endpoints.
> > ...

> The main issue with the rk3399 is the PCIe controller is buggy and
> triggers a SoC panic when certain error conditions occur that should
> be handled gracefully. One of those conditions is when an endpoint
> requests an access to wait and retry later.

I assume this refers to a Completion with Request Retry Status (RRS)?

> Many years ago we ran that issue to ground and with Robin Murphy's
> help we found that while it's possible to gracefully handle that
> condition it required hijacking the entire arm64 error handling
> routine. Not exactly scalable for just one SoC.

Do you have a pointer to that discussion?  The URL might save
repeating the whole exercise and could be useful for the commit log
when we try to resolve this.

> The configurable waits allow us to program reasonable times for
> 90% of the endpoints that come up in the normal amount of time, while
> being able to adjust it for the other 10% that do not. Some require
> multiple seconds before they return without error. Part of the reason
> we don't want to hardcode the wait time is because the probe isn't
> handled asynchronously, so the kernel appears to hang while waiting
> for the timeout.

Is there some way for users to figure out that they would need this
property?  Or is it just "if your kernel panics on boot, try
adding or increasing "bus-scan-delay-ms" in your DT?

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ