linux-kernel - Re: [PATCH v8 2/2] PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+-6iNy03Bz1-Wftf4PpuVFF0FS01d2Yo6coG+gHqwwwpRdFMw@mail.gmail.com>
Date: Thu, 11 Jan 2024 13:20:48 -0500
From: Jim Quinlan <james.quinlan@...adcom.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, Nicolas Saenz Julienne <nsaenz@...nel.org>, 
	Bjorn Helgaas <bhelgaas@...gle.com>, Lorenzo Pieralisi <lorenzo.pieralisi@....com>, 
	Cyril Brulebois <kibi@...ian.org>, Phil Elwell <phil@...pberrypi.com>, 
	bcm-kernel-feedback-list@...adcom.com, 
	Florian Fainelli <florian.fainelli@...adcom.com>, Jim Quinlan <jim2101024@...il.com>, 
	Lorenzo Pieralisi <lpieralisi@...nel.org>, Krzysztof Wilczyński <kw@...ux.com>, 
	Rob Herring <robh@...nel.org>, 
	"moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" <linux-rpi-kernel@...ts.infradead.org>, 
	"moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" <linux-arm-kernel@...ts.infradead.org>, 
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v8 2/2] PCI: brcmstb: Configure HW CLKREQ# mode
 appropriate for downstream device

On Thu, Jan 11, 2024 at 12:28 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Mon, Nov 13, 2023 at 01:56:06PM -0500, Jim Quinlan wrote:
> > The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
> > deliberately set by the PCIe RC HW into one of three mutually exclusive
> > modes:
> >
> > "safe" -- No CLKREQ# expected or required, refclk is always provided.  This
> >     mode should work for all devices but is not be capable of any refclk
> >     power savings.
> >
> > "no-l1ss" -- CLKREQ# is expected to be driven by the downstream device for
> >     CPM and ASPM L0s and L1.  Provides Clock Power Management, L0s, and L1,
> >     but cannot provide L1 substate (L1SS) power savings. If the downstream
> >     device connected to the RC is L1SS capable AND the OS enables L1SS, all
> >     PCIe traffic may abruptly halt, potentially hanging the system.
> >
> > "default" -- Bidirectional CLKREQ# between the RC and downstream device.
> >     Provides ASPM L0s, L1, and L1SS, but not compliant to provide Clock
> >     Power Management; specifically, may not be able to meet the T_CLRon max
> >     timing of 400ns as specified in "Dynamic Clock Control", section
> >     3.2.5.2.2 of the PCIe Express Mini CEM 2.1 specification.  This
> >     situation is atypical and should happen only with older devices.
> >
> > Previously, this driver always set the mode to "no-l1ss", as almost all
> > STB/CM boards operate in this mode.  But now there is interest in
> > activating L1SS power savings from STB/CM customers, which requires "aspm"
> > mode.
>
> I think this should read "default" mode, not "aspm" mode, since "aspm"
> is not a mode implemented by this patch, right?

Correct.
>
>
> > In addition, a bug was filed for RPi4 CM platform because most
> > devices did not work in "no-l1ss" mode.
>
> I think this refers to bug 217276, mentioned below?

I guess you are saying I should put a footnote marker there.

>
>
> > Note that the mode is specified by the DT property "brcm,clkreq-mode".  If
> > this property is omitted, then "default" mode is chosen.
> >
> > Note: Since L1 substates are now possible, a modification was made
> > regarding an internal bus timeout: During long periods of the PCIe RC HW
> > being in an L1SS sleep state, there may be a timeout on an internal bus
> > access, even though there may not be any PCIe access involved.  Such a
> > timeout will cause a subsequent CPU abort.
>
> This sounds scary.  If a NIC is put in L1.2, does this mean will we
> see this CPU abort if there's no traffic for a long time?  What is
> needed to avoid the CPU abort?

I don't think this  happens in normal practice as there are a slew of
low-level TLPs
and LTR messages  that are sent on a regular basis.  The only time
this timeout occured
is when  a major customer was doing a hack: IIRC, their endpoint
device has to reboot itself after link-up and driver probe,  so it
goes into L1.2 to execute this to reboot
and while doing so the connection is completely silent.


>
> Rega
> What does this mean for users?  L1SS is designed for long periods of
> the device being idle, so this leaves me feeling that using L1SS is
> unsafe in general.  Hopefully this impression is unwarranted, and all
> we need is some clarification here.


I don't think it will affect most users, if any.

Regards,
Jim Quinlan
Broadcom STB/CM



>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
> >
> > Signed-off-by: Jim Quinlan <james.quinlan@...adcom.com>
> > Tested-by: Florian Fainelli <florian.fainelli@...adcom.com>
> > ---
> >  drivers/pci/controller/pcie-brcmstb.c | 96 ++++++++++++++++++++++++---
> >  1 file changed, 86 insertions(+), 10 deletions(-)
> > ...

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4210 bytes)