[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <691e3542c1b2a_1a37510046@dwillia2-mobl4.notmuch>
Date: Wed, 19 Nov 2025 13:23:14 -0800
From: <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <dave@...olabs.net>,
<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
<alison.schofield@...el.com>, <dan.j.williams@...el.com>,
<bhelgaas@...gle.com>, <shiju.jose@...wei.com>, <ming.li@...omail.com>,
<Smita.KoralahalliChannabasappa@....com>, <rrichter@....com>,
<dan.carpenter@...aro.org>, <PradeepVineshReddy.Kodamati@....com>,
<lukas@...ner.de>, <Benjamin.Cheatham@....com>,
<sathyanarayanan.kuppuswamy@...ux.intel.com>, <linux-cxl@...r.kernel.org>,
<alucerop@....com>, <ira.weiny@...el.com>
CC: <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<terry.bowman@....com>
Subject: Re: [RESEND v13 12/25] cxl/pci: Unify CXL trace logging for CXL
Endpoints and CXL Ports
Terry Bowman wrote:
> CXL currently has separate trace routines for CXL Port errors and CXL
> Endpoint errors. This is inconvenient for the user because they must enable
> 2 sets of trace routines. Make updates to the trace logging such that a
> single trace routine logs both CXL Endpoint and CXL Port protocol errors.
No, this is not inconvient, this is required for compatible evolution of
tracepoints. The change in this patch breaks compatibility as it
violates the expectation the type and order of TP_ARGS does not change
from one kernel to next.
> Keep the trace log fields 'memdev' and 'host'. While these are not accurate
> for non-Endpoints the fields will remain as-is to prevent breaking
> userspace RAS trace consumers.
>
> Add serial number parameter to the trace logging. This is used for EPs
> and 0 is provided for CXL port devices without a serial number.
>
> Leave the correctable and uncorrectable trace routines' TP_STRUCT__entry()
> unchanged with respect to member data types and order.
>
> Below is output of correctable and uncorrectable protocol error logging.
> CXL Root Port and CXL Endpoint examples are included below.
>
> Root Port:
> cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0 status='CRC Threshold Hit'
> cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0 status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
A root port is not a "memdev", another awkward side effect of trying to
combine 2 trace points with different use cases.
So a NAK from me for this change (unless there is an strong reason for
Linux to inflict the compat breakage), please keep the separate
tracepoints they are for distinctly different use cases. A memdev
protocol error is contained to that memdev, a port protocol error
implicates every CXL.cachemem descendant of that port.
Powered by blists - more mailing lists