[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68815a66459e4_134cc710012@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 23 Jul 2025 14:55:50 -0700
From: <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <dave@...olabs.net>,
<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
<alison.schofield@...el.com>, <dan.j.williams@...el.com>,
<bhelgaas@...gle.com>, <shiju.jose@...wei.com>, <ming.li@...omail.com>,
<Smita.KoralahalliChannabasappa@....com>, <rrichter@....com>,
<dan.carpenter@...aro.org>, <PradeepVineshReddy.Kodamati@....com>,
<lukas@...ner.de>, <Benjamin.Cheatham@....com>,
<sathyanarayanan.kuppuswamy@...ux.intel.com>, <terry.bowman@....com>,
<linux-cxl@...r.kernel.org>
CC: <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v10 00/17] Enable CXL PCIe Port Protocol Error handling
and logging
Terry Bowman wrote:
> This patchset updates CXL Protocol Error handling for CXL Ports and CXL
> Endpoints (EP). The reach of this patchset grew from CXL Ports to include
> EPs as well.
[..]
> == Testing ==
> Testing results below shows the Upstream Switch Port UCE and EP UCE errors
> are handled as PCI errors. This is because aer_get_device_error_info() does
> not populate the AER error severity and status in the case of FATAL UCE on
> Upstream Ports and Endpoints. This is intended because the USP link to
> access the device can be compromised. The check for is_cxl_error() and
> is_internal_error() fail as a result and then processes the error as a PCI
> error. Also, the AER event logging is missing the PCIe AER status.
Are those issues "TODO" or permanent quirks of the implementation?
Although looking at the error message they all seem to correctly say "CXL
Bus Error", I guess I am not seting the end user visible problem of the
details you are pointing out here. I.e. LGTM.
[..]
> == Root Port ==
> root@...wman-cxl:~/aer-inject# ./root-ce-inject.sh
Where can I find these inject scripts?
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0
> aer_event: 0000:0c:00.0 CXL Bus Error: severity=Corrected, Corrected Internal Error, TLP Header=Not available
> pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00004000/0000a000
> pcieport 0000:0c:00.0: [14] CorrIntErr
> cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial=0 status='CRC Threshold Hit'
Hmm, why "memdev=" for a root port error? Will take a look at what
cxl_aer_correctable_error() is doing.
[..]
> base-commit: 716ba3023561ccacfaa28f988d26717535b8fed1
I cannot find this commit in mainline nor linux-next. Please do try to
base series on mainline tags, or otherwise push a public baseline branch
somewhere. Helps reviewers and build bots.
Powered by blists - more mailing lists