lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221109173454.GA567052@bhelgaas>
Date:   Wed, 9 Nov 2022 11:34:54 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Nathan Rossi <nathan@...hanrossi.com>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Nathan Rossi <nathan.rossi@...i.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "Maciej W. Rozycki" <macro@...am.me.uk>
Subject: Re: [PATCH] PCI/ASPM: Wait for data link active after retraining

[+cc Maciej for similar retrain issue]

On Tue, Nov 08, 2022 at 04:29:44PM -0600, Bjorn Helgaas wrote:
> On Thu, Jun 02, 2022 at 06:55:44AM +0000, Nathan Rossi wrote:
> > From: Nathan Rossi <nathan.rossi@...i.com>
> > 
> > When retraining the link either the child or the parent device may have
> > the data link layer state machine of the respective devices move out of
> > the active state despite the physical link training being completed.
> > Depending on how long is takes for the devices to return to the active
> > state, the device may not be ready and any further reads/writes to the
> > device can fail.
> > 
> > This issue is present with the pci-mvebu controller paired with a device
> > supporting ASPM but without advertising the Slot Clock, where during
> > boot the pcie_aspm_cap_init call would cause common clocks to be made
> > consistent and then retrain the link. However the data link layer would
> > not be active before any device initialization (e.g. ASPM capability
> > queries, BAR configuration) causing improper configuration of the device
> > without error.
> > 
> > To ensure the child device is accessible, after the link retraining use
> > pcie_wait_for_link to perform the associated state checks and any needed
> > delays.
> > 
> > Signed-off-by: Nathan Rossi <nathan.rossi@...i.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index a96b7424c9..4b8a1810be 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
> >  		reg16 &= ~PCI_EXP_LNKCTL_CCC;
> >  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >  
> > -	if (pcie_retrain_link(link))
> > +	/* Retrain link and then wait for the link to become active */
> > +	if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
> 
> pcie_retrain_link() waits for PCI_EXP_LNKSTA_LT (Link Training) to be
> cleared, which means the LTSSM has exited the Configuration/Recovery
> state.  pcie_wait_for_link() waits for PCI_EXP_LNKSTA_DLLLA (Data Link
> Layer Link Active) to be set, which means the link is in DL_Active.
> 
> I don't see an explicit procedure in the spec for determining when
> a link retrain is complete, but from PCIe r6.0, sec 6.2.11 (DPC):
> 
>   After software releases the Downstream Port from DPC, the Port’s
>   LTSSM must transition to the Detect state, where the Link will
>   attempt to retrain. Software can use Data Link Layer State Changed
>   interrupts, DL_ACTIVE ERR_COR signaling, or both, to signal when the
>   Link reaches the DL_Active state again.
> 
> and sec 6.6:
> 
>   On the completion of Link Training (entering the DL_Active state,
>   see Section 3.2), a component must be able to receive and process
>   TLPs and DLLPs.
> 
> The only use mentioned in the spec for the Link Training bit is the
> implementation note in sec 7.5.3.7 about avoiding race conditions when
> using the Retrain Link bit, where software should poll Link Training
> until it returns to zero before setting the Retrain Link bit to change
> link parameters.
> 
> And I think you're absolutely right that what we *want* here is the
> data link layer DL_Active state, not just the link layer L0 state.
> 
> This all makes me think that checking the Link Training bit might be
> the wrong thing to begin with.
> 
> Of course, the Data Link Layer Link Active bit wasn't added until PCIe
> r1.1, and even now it's optional.  Without it, I don't know if there's
> a way to make sure the link is in DL_Active.
> 
> Maybe pcie_retrain_link() should wait for Data Link Layer Link Active
> if it is supported, and use the existing behavior of waiting for Link
> Training to be cleared otherwise?

Nathan, I meant to cc you on this thread, which is doing something
very similar.  Just FYI:

https://lore.kernel.org/all/alpine.DEB.2.21.2209130050380.60554@angie.orcam.me.uk/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ