linux-kernel - Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with SPARSEMEM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1b316667-470b-4e1a-9c18-e42571e4769c@kernel.org>
Date: Tue, 4 Nov 2025 10:33:39 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Xie Yuanbin <xieyuanbin1@...wei.com>, david@...hat.com,
 dave.hansen@...el.com, bp@...en8.de, tglx@...utronix.de, mingo@...hat.com,
 dave.hansen@...ux.intel.com, hpa@...or.com, akpm@...ux-foundation.org,
 lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, linmiaohe@...wei.com,
 nao.horiguchi@...il.com, luto@...nel.org, peterz@...radead.org,
 tony.luck@...el.com
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 linux-edac@...r.kernel.org, will@...nel.org, liaohua4@...wei.com,
 lilinjie8@...wei.com
Subject: Re: [PATCH v2 0/2] x86/mm: support memory-failure on 32-bits with
 SPARSEMEM

On 04.11.25 08:23, Xie Yuanbin wrote:
> Memory bit flips are among the most common hardware errors in the server
> and embedded fields, many hardware components have memory verification
> mechanisms, for example ECC. When an error is detected, some hardware or
> architectures report the information to software (OS/BIOS), for example,
> the MCE (Machine Check Exception) on x86.
> 
> Common errors include CE (Correctable Errors) and UE (Uncorrectable
> Errors). When the kernel receives memory error information, if it has the
> memory-failure feature, it can better handle memory errors without reboot.
> For example, kernel can attempt to offline the affected memory by
> migrating it or killing the process. Therefore, this feature is widely
> used in servers and embedded fields.

This is a pretty generic description of MCEs.

I think what we are missing is: who runs 32bit OSes on MCE-capable 
hardware (or VMs?) and needs this to work.

What's the use case?

-- 
Cheers

David