lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e876857b-ab34-5b43-8355-649a7d15e2c6@amd.com>
Date:   Mon, 24 Apr 2023 13:39:07 -0500
From:   Terry Bowman <Terry.Bowman@....com>
To:     Dan Williams <dan.j.williams@...el.com>,
        alison.schofield@...el.com, vishal.l.verma@...el.com,
        ira.weiny@...el.com, bwidawsk@...nel.org, dave.jiang@...el.com,
        Jonathan.Cameron@...wei.com, linux-cxl@...r.kernel.org
Cc:     rrichter@....com, linux-kernel@...r.kernel.org, bhelgaas@...gle.com
Subject: Re: [PATCH v3 4/6] cxl/pci: Add RCH downstream port error logging

Hi Dan,

I added comments inline below.

On 4/17/23 19:06, Dan Williams wrote:
> Terry Bowman wrote:
>> RCH downstream port error logging is missing in the current CXL driver. The
>> missing AER and RAS error logging is needed for communicating driver error
>> details to userspace. Update the driver to include PCIe AER and CXL RAS
>> error logging.
>>
>> Add RCH downstream port error handling into the existing RCiEP handler.
>> The downstream port error handler is added to the RCiEP error handler
>> because the downstream port is implemented in a RCRB, is not PCI
>> enumerable, and as a result is not directly accessible to the PCI AER
>> root port driver. The AER root port driver calls the RCiEP handler for
>> handling RCD errors and RCH downstream port protocol errors.
>>
>> Update mem.c to include RAS and AER setup. This includes AER and RAS
>> capability discovery and mapping for later use in the error handler.
>>
>> Disable RCH downstream port's root port cmd interrupts.[1]
>>
>> Update existing RCiEP correctable and uncorrectable handlers to also call
>> the RCH handler. The RCH handler will read the RCH AER registers, check for
>> error severity, and if an error exists will log using an existing kernel
>> AER trace routine. The RCH handler will also log downstream port RAS errors
>> if they exist.
> 
> I think this patch wants a lead in refactoring to move the existing
> probe of the CXL RAS capability into the cxl_port driver so that the RCH
> path and the VH path can be unified for register mapping and error
> handling invocation. I do not see a compelling rationale to have 2
> separate ways to map the RAS capability. The timing of when
> cxl_setup_ras() is called looks problematic relative to when the first
> error handler callback might happen.
> 

With respect to timing, I see this works for probing AER and RAS. Will it 
work for caching the mapped AER and RAS addresses? I ask because the mapped 
AER and RAS addresses are stored in cxlds and cxlds is created in cxl_pci 
and isn't necessarily available during RCH dport discovery. RCH dport is 
discovered within cxl_acpi context (beginning from cxl_acpi_probe()). Also, 
port.c code shows cxlds is not typically used. 

If you like I can change RCH RAS mapping to use cxl_map_component_regs()? This 
was in cxl_rch_map_ras() to handle the RCH odd case for AER and RAS mapping. 
The RAS can be moved out but RCH AER would still need to be mapped presumably still
in cxl_rch_map_ras().

> For example what happens when an error fires after cxl_pci has
> registered its error handlers, but before the component registers have
> been mapped out of the RCRB?
>

The RCiEP ISR would execute but the RCH AER and RAS would not be logged because 
neither are mapped and are instead NULL. The AER and RAS register status would stay 
resident and be logged in the next ISR entry. 
 
> This implies the need for a callback for cxl_pci to notify the cxl_port
> driver of CXL errors to handle relative to a PCI AER event.

Along similar lines, could the RCH AER and RAS status be checked immediately after 
mapping and logged if status is present?

Regards,
Terry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ