[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251230202211.GAaVQ0cx8o-CqzGU2O@fat_crate.local>
Date: Tue, 30 Dec 2025 21:22:11 +0100
From: Borislav Petkov <bp@...en8.de>
To: Ruidong Tian <tianruidong@...ux.alibaba.com>
Cc: catalin.marinas@....com, will@...nel.org, lpieralisi@...nel.org,
guohanjun@...wei.com, sudeep.holla@....com,
xueshuai@...ux.alibaba.com, linux-kernel@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
rafael@...nel.org, lenb@...nel.org, tony.luck@...el.com,
yazen.ghannam@....com, misono.tomohiro@...itsu.com,
fengwei_yin@...ux.alibaba.com
Subject: Re: [PATCH v5 00/17] ARM Error Source Table V2 Support
Some high-level notes first:
On Tue, Dec 30, 2025 at 05:09:28PM +0800, Ruidong Tian wrote:
> This series introduces support for the ARM Error Source Table (AEST), aligning
> with version 2.0 of ACPI for Armv8 RAS Extensions [0].
I'd like to hear from ARM folks what the strategy for this thing is...
> AEST provides a critical mechanism for hardware to directly notify the
> operating system kernel about RAS errors via interrupts, a concept known as
> Kernel-first error handling. Compared to firmware-first error handling
> (e.g., GHES), AEST offers a more lightweight approach. This efficiency allows
> the OS to potentially report every Corrected Error (CE), enabling upper-layer
> applications to leverage CE information for error prediction[1][2].
>
> This series is based on Tyler Baicar's preliminary patches [3], which have not
> yet been sent to the mailing list as v2.
I guess I'll wait for those first.
> AEST Driver Architecture
> ========================
>
> The AEST driver is structured into three primary components:
> - AEST device: Responsible for handling interrupts, managing the lifecycle
> of AEST nodes, and processing error records.
> - AEST node: Corresponds directly to a RAS node in the hardware
What is a "RAS node"?
> - AEST record: Represents a set of RAS registers associated with a specific
> error source.
...
> Address Translation
> ===================
>
> As described in section 2.2 [0], error addresses reported by AEST records
> may be "node-specific Logical Addresses" rather than the "System Physical
> Addresses" (SPA) used by the kernel. Therefore, the driver needs to translate
> these Logical Addresses (LA) to SPA. This translation mechanism is conceptually
> similar to AMD's Address Translation Logic (ATL) [4], leading patch 0014 to
> introduce a common translation function for both AMD and ARM architectures.
Say what now?
The ATL is very AMD-specific. What does "conceptually similar" mean exactly?
What happens if we have to change the ATL and break your use case in the
process?
What exact functionality from the ATL do you really need here?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists