[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7497d32c-14b0-4ee5-8e0c-70be470bee0d@amd.com>
Date: Fri, 7 Feb 2025 13:05:14 -0600
From: "Bowman, Terry" <terry.bowman@....com>
To: Gregory Price <gourry@...rry.net>
Cc: linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, nifan.cxl@...il.com, dave@...olabs.net,
jonathan.cameron@...wei.com, dave.jiang@...el.com,
alison.schofield@...el.com, vishal.l.verma@...el.com,
dan.j.williams@...el.com, bhelgaas@...gle.com, mahesh@...ux.ibm.com,
ira.weiny@...el.com, oohall@...il.com, Benjamin.Cheatham@....com,
rrichter@....com, nathan.fontenot@....com,
Smita.KoralahalliChannabasappa@....com, lukas@...ner.de,
ming.li@...omail.com, PradeepVineshReddy.Kodamati@....com, alucerop@....com
Subject: Re: [PATCH v5 05/16] PCI/AER: Add CXL PCIe Port correctable error
support in AER service driver
On 2/6/2025 12:33 PM, Gregory Price wrote:
> On Tue, Jan 07, 2025 at 08:38:41AM -0600, Terry Bowman wrote:
>> The AER service driver supports handling Downstream Port Protocol Errors in
>> Restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
>> functionality for CXL PCIe Ports operating in Virtual Hierarchy (VH)
>> mode.[1]
>>
>> CXL and PCIe Protocol Error handling have different requirements that
>> necessitate a separate handling path. The AER service driver may try to
>> recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
>> suitable for CXL PCIe Port devices because of potential for system memory
>> corruption. Instead, CXL Protocol Error handling must use a kernel panic
>> in the case of a fatal or non-fatal UCE. The AER driver's PCIe Protocol
>> Error handling does not panic the kernel in response to a UCE.
>>
> Naive question: is a panic actually required if the memory is a userland
> resource?
>
> The code in arch/x86/kernel/cpu/mce/core.c suggests we may not panic
> if an uncorrectable error occurs in this fashion, but simply a SIGBUS.
>
> Unless this is down the wrong pipe - in which case disregard.
>
> I'm still digging through background on this patch set so I may be
> barking up the wrong tree.
>
> ~Gregory
The plan is to panic on any CXL device with uncorrectable errors
regardless of where used. This is to avoid corruption.
Terry
Powered by blists - more mailing lists