[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b316667-470b-4e1a-9c18-e42571e4769c@kernel.org>
Date: Tue, 4 Nov 2025 10:33:39 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Xie Yuanbin <xieyuanbin1@...wei.com>, david@...hat.com,
dave.hansen@...el.com, bp@...en8.de, tglx@...utronix.de, mingo@...hat.com,
dave.hansen@...ux.intel.com, hpa@...or.com, akpm@...ux-foundation.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, linmiaohe@...wei.com,
nao.horiguchi@...il.com, luto@...nel.org, peterz@...radead.org,
tony.luck@...el.com
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-edac@...r.kernel.org, will@...nel.org, liaohua4@...wei.com,
lilinjie8@...wei.com
Subject: Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with
SPARSEMEM
On 04.11.25 08:23, Xie Yuanbin wrote:
> Memory bit flips are among the most common hardware errors in the server
> and embedded fields, many hardware components have memory verification
> mechanisms, for example ECC. When an error is detected, some hardware or
> architectures report the information to software (OS/BIOS), for example,
> the MCE (Machine Check Exception) on x86.
>
> Common errors include CE (Correctable Errors) and UE (Uncorrectable
> Errors). When the kernel receives memory error information, if it has the
> memory-failure feature, it can better handle memory errors without reboot.
> For example, kernel can attempt to offline the affected memory by
> migrating it or killing the process. Therefore, this feature is widely
> used in servers and embedded fields.
This is a pretty generic description of MCEs.
I think what we are missing is: who runs 32bit OSes on MCE-capable
hardware (or VMs?) and needs this to work.
What's the use case?
--
Cheers
David
Powered by blists - more mailing lists