lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAd53p7ZwYNau1c=SDpGd+cqP2qO_7km9Q3-bow-Jqzo6STVFA@mail.gmail.com>
Date: Fri, 12 Jan 2024 13:14:42 +0800
From: Kai-Heng Feng <kai.heng.feng@...onical.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: adrian.hunter@...el.com, ulf.hansson@...aro.org, 
	Victor Shih <victor.shih@...esyslogic.com.tw>, Ben Chuang <benchuanggli@...il.com>, 
	linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mmc: sdhci-pci-gli: GL975x: Mask rootport's replay
 timer timeout during suspend

On Sat, Jan 6, 2024 at 5:19 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Thu, Dec 21, 2023 at 11:21:47AM +0800, Kai-Heng Feng wrote:
> > Spamming `lspci -vv` can still observe the replay timer timeout error
> > even after commit 015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the
> > replay timer timeout of AER"), albeit with a lower reproduce rate.
>
> I'm not sure what this is telling me.  By "spamming `lspci -vv`, do
> you mean that if you run lspci continually, you still see Replay Timer
> Timeout logged, e.g.,
>
>   CESta:        ... Timeout+

Yes it's logged and the AER IRQ is raised.

>
> 015c9cbcf0ad uses hard-coded PCI_GLI_9750_CORRERR_MASK offset and
> PCI_GLI_9750_CORRERR_MASK_REPLAY_TIMER_TIMEOUT value, which look like
> they *could* be PCI_ERR_COR_MASK and PCI_ERR_COR_REP_TIMER, but
> without the lspci output I can't tell for sure.  If they are, it would
> be nice to use the generic macros instead of defining new ones so it's
> easier to analyze PCI_ERR_COR_MASK usage.
>
> If 015c9cbcf0ad is updating the generic PCI_ERR_COR_MASK, it should
> only prevent sending ERR_COR.  It should not affect the *logging* in
> PCI_ERR_COR_STATUS (see PCIe r6.0, sec 6.2.3.2.2), so it shouldn't
> affect the lspci output.

PCI_GLI_9750_CORRERR_MASK is specific to GLI 975x devices, so it
doesn't conform to generic PCI_ERR_COR_STATUS behavior.

The Timeout is masked with or without commit 015c9cbcf0ad:
CEMsk:  ... Timeout+


>
> > Such AER interrupt can still prevent the system from suspending, so let
> > root port mask and unmask replay timer timeout during suspend and
> > resume, respectively.
>
> 015c9cbcf0ad looks like it masks PCI_ERR_COR_REP_TIMER in the gl975x
> Endpoint, while this patch masks it in the upstream bridge (which
> might be either a Root Port or a Switch Downstream Port, so the
> subject and this sentence are not quite right).

OK, will change it to upstream bridge in next revision.

>
> 015c9cbcf0ad says it is related to a hardware defect, and maybe this
> patch is also (mention it if so).  Both patches can prevent ERR_COR
> messages and the eventual AER interrupt, depending on whether the
> Downstream Port or the Endpoint detects the Replay Timer Timeout.
> Maybe this should have a Fixes: tag for 015c9cbcf0ad to try to keep
> these together?

Sure. This patch is intend to cover more ground based on 015c9cbcf0ad.


>
> If 015c9cbcf0ad is actually updating PCI_ERR_COR_MASK, it would be
> nice to clean that up, too.  And maybe PCI_ERR_COR_REP_TIMER should be
> masked/restored at the same place for both the Downstream Port and the
> Endpoint?

Since PCI_ERR_COR_REP_TIMER is already masked before 015c9cbcf0ad, so
I didn't think that's necessary.
Do you think it should still be masked just to be safe?

>
> > +#ifdef CONFIG_PCIEAER
> > +static void mask_replay_timer_timeout(struct pci_dev *pdev)
> > +{
> > +     struct pci_dev *parent = pci_upstream_bridge(pdev);
> > +     u32 val;
> > +
> > +     if (!parent || !parent->aer_cap)
> > +             return;
> > +
> > +     pci_read_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, &val);
> > +     val |= PCI_ERR_COR_REP_TIMER;
> > +     pci_write_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, val);
> > +}
> > +
> > +static void unmask_replay_timer_timeout(struct pci_dev *pdev)
> > +{
> > +     struct pci_dev *parent = pci_upstream_bridge(pdev);
> > +     u32 val;
> > +
> > +     if (!parent || !parent->aer_cap)
> > +             return;
> > +
> > +     pci_read_config_dword(pdev, parent->aer_cap + PCI_ERR_COR_MASK, &val);
> > +     val &= ~PCI_ERR_COR_REP_TIMER;
>
> I think I would save the previous PCI_ERR_COR_REP_TIMER value and
> restore it here, so it is preserved if there is ever a generic
> interface via sysfs or similar to manage correctable error masking.

Makes sense, will do in next revision.

Kai-Heng

>
> > +     pci_write_config_dword(pdev, parent->aer_cap + PCI_ERR_COR_MASK, val);
> > +}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ