lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <314eedc5-c27e-4e63-b74a-7b06f64fdd86@intel.com>
Date: Sat, 16 Dec 2023 04:41:36 +0530
From: Sohil Mehta <sohil.mehta@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, "x86@...nel.org" <x86@...nel.org>,
	Borislav Petkov <bp@...en8.de>
CC: Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "H . Peter Anvin" <hpa@...or.com>, "Yazen
 Ghannam" <yazen.ghannam@....com>, Arnd Bergmann <arnd@...db.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: x86/mce: Is mce_is_memory_error() incorrect for Intel?

Thanks Tony for the explanation. It is very helpful.

>> Type                          Form
>> ----                          ----
>> Generic Cache Hierarchy       000F 0000 0000 11LL
>> TLB Errors                    000F 0000 0001 TTLL
>> Memory Controller Errors      000F 0000 1MMM CCCC
>> Cache Hierarchy Errors                000F 0001 RRRR TTLL
>> Extended Memory Errors                000F 0010 1MMM CCCC
>> Bus and Interconnect Errors   000F 1PPT RRRR IILL
>>
>> I am not sure what are the practical implications of getting
>> mce_is_memory_error() wrong. (This issue is completely theoretical right
>> now.) Any insights?
> 
> This function is used to check whether an address is OS addressable memory
> (i.e. for a page that could be taken offline). That doesn't apply to the caching
> use case (the only way to "offline" such a page would be to offline each of the
> slow memory pages that it might be used for).
> 

Makes sense. I am assuming these Extended Memory Errors will not be used
anymore (even for CXL.mem type configs) and we don't need to include
them in the mce_is_memory_error() check? I'll update the comment
accordingly.

> I'm not quite sure why bit 8 (cache hierarchy error) was added into this check,
> It would seem to have the same issues as extended memory.
> 

>From a little bit of digging it seems the check for "cache hierarchy
errors" was always there. Commit fa92c5869426 ("x86, mce: Support memory
error recovery for both UCNA and Deferred error in machine_check_poll")
introduced the original checks but maybe the intention at that time was
different? I see that the CEC stuff was added later so maybe the
original memory related failures were handled differently?

Now, should we remove the cache error related check from
mce_is_memory_error()?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ