lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65f3a842988d6_a9b4294f7@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Thu, 14 Mar 2024 18:45:38 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Li Ming <ming4.li@...el.com>, <dan.j.williams@...el.com>,
	<rrichter@....com>, <terry.bowman@....com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Li Ming
	<ming4.li@...el.com>
Subject: Re: [RFC PATCH 0/6] Add support for root port RAS error handling

Li Ming wrote:
> Protocol errors signaled to a CXL root port may be captured by a Root
> Complex Event Collector(RCEC). If those errors are not cleared and
> reported the system owner loses forensic information for system failure
> analysis.
> 
> Per CXL r3.1 section 9.18.1.5, the recommendation for this case from CXL
> specification is the 'Else' statement in 'IMPLEMENTATION NODE' under
> 'Table 9-24 RDPAS Structure':
> 
> 	"Probe all CXL Downstream Ports and determine whether they have logged an
> 	error in the CXL.io or CXL.cachemem status registers."
> 
> The CXL subsystem already supports RCH RAS Error handling that has a
> dependency on the RCEC. Reuse and extend that RCH topoogy support to
> handle reported errors in the VH topology case. The implementation is
> composed of:
> * Provide a new interface from RCEC side to support walk all devices
>   under RCEC and RCEC associated bus range. PCIe AER core uses this
>   interface to walk all CXL endpoints and all CXL root ports under the
>   bus ranges.
> * Update the PCIe AER core to enable Uncorrectable Internal Errors and
>   Correctable Internal Errors report for root ports.

Thanks for the above background.

> * Invoke the cxl_pci error handler for RCEC reported errors.

So what do you expect happens when a switch is involved? In the RCH case
it knows that the only thing that can fire RCEC is a root complex
integrated endpoint implementation driven by cxl_pci. In the VH case it
could be a switch.

> * Handle root-port errors in the cxl_pci handler when the device is
>   direct attached.

I do expect direct-attach to be a predominant use case, but I want to
make sure that the implementation at least does not make the switch port
error handling case more difficult to implement.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ