lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 29 May 2024 05:32:50 +0000
From: "Duan, Zhenzhong" <zhenzhong.duan@...el.com>
To: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>
CC: "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"rafael@...nel.org" <rafael@...nel.org>, "lenb@...nel.org" <lenb@...nel.org>,
	"james.morse@....com" <james.morse@....com>, "Luck, Tony"
	<tony.luck@...el.com>, "bp@...en8.de" <bp@...en8.de>, "dave@...olabs.net"
	<dave@...olabs.net>, "jonathan.cameron@...wei.com"
	<jonathan.cameron@...wei.com>, "Jiang, Dave" <dave.jiang@...el.com>,
	"Schofield, Alison" <alison.schofield@...el.com>, "Verma, Vishal L"
	<vishal.l.verma@...el.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"bhelgaas@...gle.com" <bhelgaas@...gle.com>, "helgaas@...nel.org"
	<helgaas@...nel.org>, "mahesh@...ux.ibm.com" <mahesh@...ux.ibm.com>,
	"oohall@...il.com" <oohall@...il.com>, "linmiaohe@...wei.com"
	<linmiaohe@...wei.com>, "shiju.jose@...wei.com" <shiju.jose@...wei.com>,
	"Preble, Adam C" <adam.c.preble@...el.com>, "lukas@...ner.de"
	<lukas@...ner.de>, "Smita.KoralahalliChannabasappa@....com"
	<Smita.KoralahalliChannabasappa@....com>, "rrichter@....com"
	<rrichter@....com>, "linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Tsaur, Erwin"
	<erwin.tsaur@...el.com>, "Kuppuswamy, Sathyanarayanan"
	<sathyanarayanan.kuppuswamy@...el.com>, "Williams, Dan J"
	<dan.j.williams@...el.com>, "Wanyan, Feiting" <feiting.wanyan@...el.com>,
	"Wang, Yudong" <yudong.wang@...el.com>, "Peng, Chao P"
	<chao.p.peng@...el.com>, "qingshun.wang@...ux.intel.com"
	<qingshun.wang@...ux.intel.com>
Subject: RE: [PATCH v4 0/3] PCI/AER: Handle Advisory Non-Fatal error

Hi,

Kindly ping.
Appreciate comments and suggestions so I could go ahead.

Thanks
Zhenzhong

>-----Original Message-----
>From: Duan, Zhenzhong <zhenzhong.duan@...el.com>
>Subject: [PATCH v4 0/3] PCI/AER: Handle Advisory Non-Fatal error
>
>Hi,
>
>This is a relay work of Qingshun's v2 [1], but changed to focus on ANFE
>processing as subject suggests and drops trace-event for now. I think it's
>a bit heavy to do extra IOes to get PCIe registers only for trace purpose
>and not see it a community request for now.
>
>According to PCIe Base Specification Revision 6.1, Sections 6.2.3.2.4 and
>6.2.4.3, certain uncorrectable errors will signal ERR_COR instead of
>ERR_NONFATAL, logged as Advisory Non-Fatal Error(ANFE), and set bits in
>both Correctable Error(CE) Status register and Uncorrectable Error(UE)
>Status register. Currently, when handling AER events the kernel will only
>look at CE status or UE status, but never both. In the ANFE case, bits set
>in the UE status register will not be reported and cleared until the next
>FE/NFE arrives.
>
>For instance, previously, when the kernel receives an ANFE with Poisoned
>TLP in OS native AER mode, only the status of CE will be reported and
>cleared:
>
>  AER: Correctable error message received from 0000:b7:02.0
>  PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>    device [8086:0db0] error status/mask=00002000/00000000
>     [13] NonFatalErr
>
>If the kernel receives a Malformed TLP after that, two UEs will be
>reported, which is unexpected. The Malformed TLP Header is lost since
>the previous ANFE gated the TLP header logs:
>
>  PCIe Bus Error: severity="Uncorrectable (Fatal), type=Transaction Layer,
>(Receiver ID)
>    device [8086:0db0] error status/mask=00041000/00180020
>     [12] TLP                    (First)
>     [18] MalfTLP
>
>To handle this case properly, calculate potential ANFE related status bits
>and save in aer_err_info. Use this information to determine the status bits
>that need to be cleared.
>
>Now, for the previous scenario, both CE status and related UE status will
>be reported and cleared after ANFE:
>
>  AER: Correctable error message received from 0000:b7:02.0
>  PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>    device [8086:0db0] error status/mask=00002000/00000000
>     [13] NonFatalErr
>    Uncorrectable errors that may cause Advisory Non-Fatal:
>     [18] TLP
>
>Note:
>checkpatch.pl will produce following warnings on PATCH2/3:
>
>WARNING: 'UE' may be misspelled - perhaps 'USE'?
>#22:
>uncorrectable error(UE) status should be cleared. However, there is no
>
>...similar warnings omitted...
>
>This is a false-positive, so not fixed.
>
>WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit
>description?)
>#10:
>  PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>
>...similar warnings omitted...
>
>For readability reasons, these warnings are not fixed.
>
>
>
>[1] https://lore.kernel.org/linux-pci/20240125062802.50819-1-
>qingshun.wang@...ux.intel.com
>
>Thanks
>Qingshun, Zhenzhong
>
>Changelog:
>v4:
>  - Fix a race in anfe_get_uc_status() (Jonathan)
>  - Add a comment to explain side effect of processing ANFE as NFE (Jonathan)
>  - Drop the check for PCI_EXP_DEVSTA_NFED
>
>v3:
>  - Split ANFE print and processing to two patches (Bjorn)
>  - Simplify ANFE handling, drop trace event
>  - Polish comments and patch description
>  - Add Tested-by
>
>v2:
>  - Reference to the latest PCIe Specification in both commit messages
>    and comments, as suggested by Bjorn Helgaas.
>  - Describe the reason for storing additional information in
>    aer_err_info in the commit message of PATCH 1, as suggested by Bjorn
>    Helgaas.
>  - Add more details of behavior changes in the commit message of PATCH
>    2, as suggested by Bjorn Helgaas.
>
>v3: https://lore.kernel.org/lkml/20240417061407.1491361-1-
>zhenzhong.duan@...el.com
>v2: https://lore.kernel.org/linux-pci/20240125062802.50819-1-
>qingshun.wang@...ux.intel.com
>v1: https://lore.kernel.org/linux-pci/20240111073227.31488-1-
>qingshun.wang@...ux.intel.com
>
>Zhenzhong Duan (3):
>  PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info
>  PCI/AER: Print UNCOR_STATUS bits that might be ANFE
>  PCI/AER: Clear UNCOR_STATUS bits that might be ANFE
>
> drivers/pci/pci.h      |  1 +
> drivers/pci/pcie/aer.c | 75
>+++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 75 insertions(+), 1 deletion(-)
>
>--
>2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ