[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1243a296-f018-42f8-9c07-297862fb0f4a@linux.alibaba.com>
Date: Fri, 9 Jan 2026 20:26:27 +0800
From: Ruidong Tian <tianruidong@...ux.alibaba.com>
To: Borislav Petkov <bp@...en8.de>
Cc: catalin.marinas@....com, will@...nel.org, lpieralisi@...nel.org,
guohanjun@...wei.com, sudeep.holla@....com, xueshuai@...ux.alibaba.com,
linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, rafael@...nel.org, lenb@...nel.org,
tony.luck@...el.com, yazen.ghannam@....com, misono.tomohiro@...itsu.com,
fengwei_yin@...ux.alibaba.com
Subject: Re: [PATCH v5 00/17] ARM Error Source Table V2 Support
在 2026/1/9 18:34, Borislav Petkov 写道:
> On Mon, Jan 05, 2026 at 05:12:25PM +0800, Ruidong Tian wrote:
>>> What is a "RAS node"?
>> A RAS node is the hardware interface for error reporting and control,
>> consisting of one or more register sets (a collection of RAS records). It is
>> responsible for error logging and interrupt signaling[0].
>
> OMG, one more meaning for the word "node". Because we're not ambiguous enough.
>
> /facepalm
>
>> A single hardware component can feature multiple RAS nodes. For example, a
>> memory controller is treated as a "RAS device", where each memory channel
>> has its own RAS node. Interrupts generated by these nodes are typically
>> aggregated into a single interrupt line managed at the RAS device level.
>
> Nomenclaturial tragedy, I'd say.
>
>> Comparison with x86 MCA:
>>
>> RAS record ≈ MCA bank.
>> RAS node ≈ A set of MCA banks + CMCI on a core.
>>
>> The key difference lies in uncore handling: x86 typically maps uncore errors
>> (like those from a memory controller) into core-based MCA banks. In
>> contrast, ARM requires uncore components to provide their own standalone RAS
>> nodes. When a component requires multiple such nodes, they are grouped and
>> managed as a "RAS device" in AEST driver.
>>
>> [0]: https://developer.arm.com/documentation/ihi0100/latest
>
> Yah, thanks for explaining.
>
>>> The ATL is very AMD-specific. What does "conceptually similar" mean exactly?
>> By "conceptually similar," I mean that both ARM and AMD share the same
>> functional requirement: translating between a System Physical Address (SPA)
>> and a device-specific address (like a DRAM address) for RAS purposes.
>>
>> The goal here is not to share the hardware-specific translation logic, but
>> to provide a unified interface (an abstraction layer). The actual
>> implementation of the translation remains entirely architecture-specific.
>
> And why do we need an arch-overlapping unified interface?
>
> You can just as well have aest_convert_la_to_spa() and none of that "unifying"
> churn.
>
You're right, that would be much cleaner. I was trying too hard to keep
the interface unified across architectures. I'll drop the unified
interface and use a direct helper instead in next version. Thanks for
the feedback!
Powered by blists - more mailing lists