lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190425222756.GR2754@voidbox.localdomain>
Date:   Fri, 26 Apr 2019 00:27:57 +0200
From:   Remi Pommarel <repk@...plefau.lt>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Ellie Reeves <ellierevves@...il.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2] PCI: aardvark: Use LTSSM state to build link training
 flag

Hi Bjorn,

On Thu, Apr 25, 2019 at 04:04:39PM -0500, Bjorn Helgaas wrote:
> Hi Remi,
> 
> On Sat, Mar 16, 2019 at 05:12:43PM +0100, Remi Pommarel wrote:
> > The PCI_EXP_LNKSTA_LT flag in the emulated root device's PCI_EXP_LNKSTA
> > config register does not reflect the actual link training state and is
> > always cleared. The Link Training and Status State Machine (LTSSM) flag
> > in LMI config register could be used as a link training indicator.
> 
> LMI?  I assume this is an Aardvark-specific register?  Maybe "Aardvark
> LMI register", since the other things here are generic PCIe registers?

Yes this is an Aardvark specific register, I would have said that LMI
stands for LTSSM management interface, but this is only guessing as I
don't have access to a datasheet.

> Is this a hardware erratum?  I know advk does some software emulation,
> but it looks like the Aardvark PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKSTA
> register is awfully close to being exactly the PCIe-defined
> PCI_EXP_LNKSTA, so the difference seems like a mistake.

Vendor says that PCI_EXP_LNKSTA_LT bit in PCI_EXP_LNKSTA is marked as
"reserved" and is not implemented. It does look like a mistake to me.

> 
> > Indeed if the LTSSM is in L0 or upper state then link training has
> > completed (see [1]).
> > 
> > Unfortunately because setting the PCI_EXP_LINCTL_RL flag does not
> 
> s/PCI_EXP_LINCTL_RL/PCI_EXP_LNKCTL_RL/
> 
> > instantly imply a LTSSM state change (e.g. L0s to recovery state
> > transition takes some time), LTSSM can be in L0 but link training has
> > not finished yet. Thus a lower L0 LTSSM state followed by a L0 or upper
> > state sequence has to be seen to be sure that link training has been
> > done.
> > 
> > Because one may not call a pcie conf register read on LNKSTA after
> > doing a retrain link or may miss the link down state due to timing, a
> > 20ms timeout is used. Passing this timeout link is considered retrained.
> 
> It sounds like reading and/or writing some registers during a retrain
> causes some sort of EL1 error?  Is this a separate erratum?  Is there
> a list of the registers and operations (read/write) that are affected?
> The backtrace below suggests that it's actually a read of LNKCAP or
> LNKCTL (not LNKSTA) that caused the error.

IIUC, the backtrace below produces an EL1 error when doing a PIO
transfer while the link is still retraining. See my comment below for
more about that. But accessing any root complex's register seems fine.
> 
> It sounds like there are really two problems:
> 
>   1) Reading PCI_EXP_LNKSTA (or the Aardvark equivalent) doesn't give
>      valid data for PCI_EXP_LNKSTA_LT.

The 1) is correct.

> 
>   2) Sometimes config reads cause EL1 errors.

Actually EL1 error happens when we try to access device's register with
a PIO transfer, which is when we try to use the link while it is being
retrained.

IMHO, 1) and 2) are linked. ASPM core tries to use the link too early
because it has read invalid data for PCI_EXP_LNKSTA_LT.

> 
> If that's the case and if it's possible, can you split this into a
> patch for each issue?
> 
> > This fixes boot hang or kernel panic with the following callstack due to
> > ASPM setup doing a link re-train and polling for PCI_EXP_LNKSTA_LT flag
> > to be cleared before using it.
> > 
> > -------------------- 8< -------------------
> > 	[    0.915389]  dump_backtrace+0x0/0x140
> > 	[    0.915391]  show_stack+0x14/0x20
> > 	[    0.915393]  dump_stack+0x90/0xb4
> > 	[    0.915394]  panic+0x134/0x2c0
> > 	[    0.915396]  nmi_panic+0x6c/0x70
> > 	[    0.915398]  arm64_serror_panic+0x74/0x80
> > 	[    0.915400]  is_valid_bugaddr+0x0/0x8
> > 	[    0.915402]  el1_error+0x7c/0xe4
> > 	[    0.915404]  advk_pcie_rd_conf+0x4c/0x250
> > 	[    0.915406]  pci_bus_read_config_word+0x7c/0xd0
> > 	[    0.915408]  pcie_capability_read_word+0x90/0xc8
> > 	[    0.915410]  pcie_get_aspm_reg+0x68/0x118
> > 	[    0.915412]  pcie_aspm_init_link_state+0x460/0xa98
> 
> This backtrace doesn't make sense to me as being related to this
> issue.  You said above that the bug was that PCI_EXP_LNKSTA_LT is not
> updated.  But apparently even *reading* a register at the wrong time
> causes this EL1 error.  And pcie_get_aspm_reg() doesn't even read
> LNKSTA; it only reads LNKCAP and LNKCTL.

In fact here, advk_pcie_rd_conf+0x4c seems to be this line:
	advk_writel(pcie, 0, PIO_START);

So EL1 error seems not to happen while accessing root complex registery
but while doing a PIO transfer using the link.

> 
> BTW, if you're including a backtrace in a commit log, you can strip
> out the timestamps and the "cut" lines because they don't contribute
> information that's relevant in this context.
> 
> > 	[    0.915414]  pci_scan_slot+0xe8/0x100
> > 	[    0.915416]  pci_scan_child_bus_extend+0x50/0x288
> > 	[    0.915418]  pci_scan_bridge_extend+0x348/0x4f0
> > 	[    0.915420]  pci_scan_child_bus_extend+0x1dc/0x288
> > 	[    0.915423]  pci_scan_root_bus_bridge+0xc4/0xe0
> > 	[    0.915424]  pci_host_probe+0x14/0xa8
> > 	[    0.915426]  advk_pcie_probe+0x838/0x910
> > 	[...]
> > -------------------- 8< -------------------
> > 
> > [1] "PCI Express Base Specification", REV. 2.1
> >     PCI Express, March 4 2009, Table 4-7
> > 
> > Signed-off-by: Remi Pommarel <repk@...plefau.lt>
> > ---
> > Changes since v1:
> >   - Rename retraining flag field
> >   - Fix DEVCTL register writing
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 33 ++++++++++++++++++++++++++-
> >  1 file changed, 32 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > index eb58dfdaba1b..47b707b5fc2c 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -180,6 +180,7 @@
> >  #define LINK_WAIT_MAX_RETRIES		10
> >  #define LINK_WAIT_USLEEP_MIN		90000
> >  #define LINK_WAIT_USLEEP_MAX		100000
> > +#define LINK_RETRAIN_DELAY_MAX		(20 * HZ / 1000) /* 20 ms */
> >  
> >  #define MSI_IRQ_NUM			32
> >  
> > @@ -199,6 +200,8 @@ struct advk_pcie {
> >  	u16 msi_msg;
> >  	int root_bus_nr;
> >  	struct pci_bridge_emul bridge;
> > +	unsigned long rl_deadline; /* Retrain link jiffies deadline */
> > +	u8 rl_asked; /* Retraining has been asked and is in transition */
> >  };
> >  
> >  static inline void advk_writel(struct advk_pcie *pcie, u32 val, u64 reg)
> > @@ -400,6 +403,19 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
> >  	return -ETIMEDOUT;
> >  }
> >  
> > +static int advk_pcie_link_retraining(struct advk_pcie *pcie)
> > +{
> > +	if (!advk_pcie_link_up(pcie)) {
> > +		pcie->rl_asked = 0;
> > +		return 1;
> > +	}
> > +
> > +	if (pcie->rl_asked && time_before(jiffies, pcie->rl_deadline))
> > +		return 1;
> > +
> > +	pcie->rl_asked = 0;
> > +	return 0;
> > +}
> >  
> >  static pci_bridge_emul_read_status_t
> >  advk_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
> > @@ -426,11 +442,19 @@ advk_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
> >  		return PCI_BRIDGE_EMUL_HANDLED;
> >  	}
> >  
> > +	case PCI_EXP_LNKCTL: {
> 
> Don't you mean PCI_EXP_LNKSTA here?

No, PCI_EXP_LNKSTA and PCI_EXP_LNKCTL are consecutive 16bit registers
but bridge emulation accesses those registers by 32bit chunk. So when
one wants to read PCI_EXP_LNKSTA register, pci bridge reads 32bit data
from PCI_EXP_LNKCTL and 16 bit shift the result to the right.

This is why I use (PCI_EXP_LNKSTA_LT << 16) below.

> 
> > +		u32 val = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + reg) &
> > +			~(PCI_EXP_LNKSTA_LT << 16);
> > +		if (advk_pcie_link_retraining(pcie))
> > +			val |= (PCI_EXP_LNKSTA_LT << 16);
> > +		*value = val;
> > +		return PCI_BRIDGE_EMUL_HANDLED;
> > +	}
> > +
> >  	case PCI_CAP_LIST_ID:
> >  	case PCI_EXP_DEVCAP:
> >  	case PCI_EXP_DEVCTL:
> >  	case PCI_EXP_LNKCAP:
> > -	case PCI_EXP_LNKCTL:
> 
> If you did mean PCI_EXP_LNKSTA above, I suppose you would leave
> PCI_EXP_LNKCTL here?
> 
> >  		*value = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + reg);
> >  		return PCI_BRIDGE_EMUL_HANDLED;
> >  	default:
> > @@ -447,8 +471,15 @@ advk_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
> >  
> >  	switch (reg) {
> >  	case PCI_EXP_DEVCTL:
> > +		advk_writel(pcie, new, PCIE_CORE_PCIEXP_CAP + reg);
> > +		break;
> 
> What's the purpose of this DEVCTL change?  Could it be a separate patch?
> I can't tell that it's related to the PCI_EXP_LNKSTA_LT issue.

In fact that is the computed diff that is a bit weird here. DEVCTL does
not really change, the advk_writel() stays the same as it was before
the change.

I have actually just modified the PCI_EXP_LNKCTL case.

> 
> >  	case PCI_EXP_LNKCTL:
> >  		advk_writel(pcie, new, PCIE_CORE_PCIEXP_CAP + reg);
> > +		if (new & PCI_EXP_LNKCTL_RL) {
> > +			pcie->rl_asked = 1;
> > +			pcie->rl_deadline = jiffies + LINK_RETRAIN_DELAY_MAX;
> > +		}
> >  		break;
> >  
> >  	case PCI_EXP_RTCTL:
> > -- 
> > 2.20.1
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@...ts.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ