linux-kernel - RE: [PATCH v2] mm/memory-failure: Support disabling soft offline for HugeTLB pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB60831F028E2FEB6B5A3390D9FC14A@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Tue, 16 Sep 2025 15:20:49 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "Meyer, Kyle" <kyle.meyer@....com>, Andrew Morton
	<akpm@...ux-foundation.org>
CC: "corbet@....net" <corbet@....net>, "david@...hat.com" <david@...hat.com>,
	"linmiaohe@...wei.com" <linmiaohe@...wei.com>, "shuah@...nel.org"
	<shuah@...nel.org>, "jane.chu@...cle.com" <jane.chu@...cle.com>,
	"jiaqiyan@...gle.com" <jiaqiyan@...gle.com>, "Liam.Howlett@...cle.com"
	<Liam.Howlett@...cle.com>, "bp@...en8.de" <bp@...en8.de>,
	"hannes@...xchg.org" <hannes@...xchg.org>, "jack@...e.cz" <jack@...e.cz>,
	"joel.granados@...nel.org" <joel.granados@...nel.org>, "laoar.shao@...il.com"
	<laoar.shao@...il.com>, "lorenzo.stoakes@...cle.com"
	<lorenzo.stoakes@...cle.com>, "mclapinski@...gle.com"
	<mclapinski@...gle.com>, "mhocko@...e.com" <mhocko@...e.com>,
	"nao.horiguchi@...il.com" <nao.horiguchi@...il.com>, "osalvador@...e.de"
	<osalvador@...e.de>, "Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
	"rppt@...nel.org" <rppt@...nel.org>, "Anderson, Russ"
	<russ.anderson@....com>, "Fan, Shawn" <shawn.fan@...el.com>,
	"surenb@...gle.com" <surenb@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: RE: [PATCH v2] mm/memory-failure: Support disabling soft offline for
 HugeTLB pages

>> > Reported-by: Shawn Fan <shawn.fan@...el.com>
>> 
>> Interesting.  What did Shawn report? (Closes:!).
>
> Tony or Shawn, could you please point me to the original report? Thanks!

Original report is internal to Intel, so no useful link for the community (but
I still wanted to give credit).

Recap of original problem is that some BIOS keep track of error threshold
per-rank and use this GHES mechanism to report threshold exceeded on
the rank.

Systems that stay up a long time can accumulate enough soft errors
to trigger this threshold. But the action of taking a page offline isn't
going to help. For a 4K page this is merely annoying. For 1G page
it can mess things up badly.

My original patch for this just skipped the GHES->offline process
for huge pages. But I wasn't aware of the sysctl control. That provides
a better solution.

-Tony