[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250818151855.2950059-1-joshua.hahnjy@gmail.com>
Date: Mon, 18 Aug 2025 08:18:53 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Terry Bowman <terry.bowman@....com>
Cc: dave@...olabs.net,
jonathan.cameron@...wei.com,
dave.jiang@...el.com,
alison.schofield@...el.com,
dan.j.williams@...el.com,
bhelgaas@...gle.com,
shiju.jose@...wei.com,
ming.li@...omail.com,
Smita.KoralahalliChannabasappa@....com,
rrichter@....com,
dan.carpenter@...aro.org,
PradeepVineshReddy.Kodamati@....com,
lukas@...ner.de,
Benjamin.Cheatham@....com,
sathyanarayanan.kuppuswamy@...ux.intel.com,
linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org
Subject: Re: [PATCH v10 00/17] Enable CXL PCIe Port Protocol Error handling and logging
On Thu, 26 Jun 2025 17:42:35 -0500 Terry Bowman <terry.bowman@....com> wrote:
> This patchset updates CXL Protocol Error handling for CXL Ports and CXL
> Endpoints (EP). The reach of this patchset grew from CXL Ports to include
> EPs as well.
>
> This patchset is a continuation of v9 found here:
> https://lore.kernel.org/linux-cxl/20250603172239.159260-1-terry.bowman@amd.com/
>
> The first patch is a small cleanup change to reduce amount of code.
>
> The next 2 patches introduce pci_dev::is_cxl, aer_info::is_cxl, and add
> bus string to AER log tracing. aer_info::is_cxl will be used to indicate a
> CXL or PCI error and will be used to direct the error handling flow in
> later patches.
>
> The next patch introduces a new driver file, pci/pcie/cxl_aer.c, to move
> the existing CXL AER logic into.
>
> The next 3 patches update the AER driver and CXL driver to use a kfifo.
> The kfifo is added to offload CXL-AER protocol error work to the CXL
> driver. These patches provide the kfifo work add and work remove.
>
> The next 5 patches prepare the CXL driver for adding the updated protocol
> error handlers. This includes adding CXL Port RAS mapping and updating
> interfaces for common support.
>
> The final 5 patches add the CXL error handlers for CXL EPs and CXL Ports.
> CXL EPs keep the PCIe error handler for cases the EP error is interpreted
> as a PCIe error. These patches also add logic to unmask CXL Protocol Errors
> during port probing, and mask CXL Protocol Errors during port device
> cleanup.
Hello Terry,
Thank you for this new version. I just wanted to add that I have been testing
this new version on a few machines, and it fixes an issue that I was seeing
on v8 of the patchset.
Previously, booting a kernel with the parameter pcie_ports=compat would lead
to a kernel crash caused by a NULL pointer dereference. After I rebased the
kernel to use v10 instead, this went away and I can use pcie_ports=compat
without any complications. I tried looking in to see what the change that
led to this fix was, but couldn't find anything specific.
It seems like a use-after-free bug and happens specifically in
cxl_dport_init_ras_reporting. Since this new version fixes this issue, pleae
feel free to add my tested-by tag in future versions.
Thank you again for your work on this series! I hope you have a great day.
Joshua Hahn
Tested-by: Joshua Hahn <joshua.hahnjy@...il.com>
Sent using hkml (https://github.com/sjp38/hackermail)
Powered by blists - more mailing lists