linux-kernel - Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <CSMSVO8Z73NV.3MX3FRNO026T9@vincent-arch>
Date:   Mon, 15 May 2023 13:04:34 +0200
From:   "Vincenzo Palazzo" <vincenzopalazzodev@...il.com>
To:     "Peter Geis" <pgwipeout@...il.com>,
        "Bjorn Helgaas" <helgaas@...nel.org>
Cc:     <kw@...ux.com>, <heiko@...ech.de>, <robh@...nel.org>,
        <linux-pci@...r.kernel.org>, <shawn.lin@...k-chips.com>,
        <linux-kernel@...r.kernel.org>, <lgirdwood@...il.com>,
        <linux-rockchip@...ts.infradead.org>, <broonie@...nel.org>,
        <bhelgaas@...gle.com>,
        <linux-kernel-mentees@...ts.linuxfoundation.org>,
        <lpieralisi@...nel.org>, <linux-arm-kernel@...ts.infradead.org>,
        "Dan Johansen" <strit@...jaro.org>,
        "Catalin Marinas" <catalin.marinas@....com>,
        "Will Deacon" <will@...nel.org>,
        "Robin Murphy" <robin.murphy@....com>
Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for
 Rockchip PCIe bus scan

> >
> > There *is* a way for a PCIe device to say "I need more time".  It does
> > this by responding to that Vendor ID config read with Request Retry
> > Status (RRS, aka CRS in older specs), which means "I'm not ready yet,
> > but I will be ready in the future."  Adding a delay would definitely
> > make a difference here, so my guess is this is what's happening.
> >
> > Most root complexes return ~0 data to the CPU when a config read
> > terminates with UR or RRS.  It sounds like rockchip does this for UR
> > but possibly not for RRS.
> >
> > There is a "RRS Software Visibility" feature, which is supposed to
> > turn the RRS into a special value (Vendor ID == 0x0001), but per [1],
> > rockchip doesn't support it (lspci calls it "CRSVisible").
> >
> > But the CPU load instruction corresponding to the config read has to
> > complete by reading *something* or else be aborted.  It sounds like
> > it's aborted in this case.  I don't know the arm64 details, but if we
> > could catch that abort and determine that it was an RRS and not a UR,
> > maybe we could fabricate the magic RRS 0x0001 value.
> >
> > imx6q_pcie_abort_handler() does something like that, although I think
> > it's for arm32, not arm64.  But obviously we already catch the abort
> > enough to dump the register state and panic, so maybe there's a way to
> > extend that?
>
> Perhaps a hook mechanism that allows drivers to register with the
> serror handler and offer to handle specific errors before the generic
> code causes the system panic?

This sounds to me a good general solution that also help to handle 
future HW like this one.

So this is a Concept Ack for me.

Cheers!

Vincent.