lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Jan 2024 19:15:50 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Borislav Petkov <bp@...en8.de>
CC: Avadhut Naik <avadhut.naik@....com>, "linux-trace-kernel@...r.kernel.org"
	<linux-trace-kernel@...r.kernel.org>, "linux-edac@...r.kernel.org"
	<linux-edac@...r.kernel.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
	"x86@...nel.org" <x86@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "yazen.ghannam@....com"
	<yazen.ghannam@....com>, "avadnaik@....com" <avadnaik@....com>
Subject: RE: [PATCH v2 0/2] Update mce_record tracepoint

> > You've spent enough time with Ashok and Thomas tweaking the Linux
> > microcode driver to know that going back to the machine the next day
> > to ask about microcode version has a bunch of ways to get a wrong
> > answer.
>
> Huh, what does that have to do with this?

If deployment of a microcode update across a fleet always went
flawlessly, life would be simpler. But things can fail. And maybe the
failure wasn't noticed. Perhaps a node was rebooting when the
sysadmin pushed the update to the fleet and so missed the
deployment. Perhaps one core was already acting weird and
the microcode update didn't get applied to that core.

> IIUC, if someone changes something on the system, whether that is
> updating microcode or swapping a harddrive or swapping memory or
> whatever, that invalidates the errors reported, pretty much.
>
> You can't put it all in the trace record, you just can't.

Swapping a hard drive, or hot plugging a NIC isn't very likely
to correlate with an error reported by the CPU in machine
check banks. But microcode can be (and has been) the issue
in enough cases that knowing the version at the time of the
error matters.

You seemed to agree with this argument when the microcode
field was added to "struct mce" back in 2018

fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records")

Is it so very different to add this to a trace record so that rasdaemon
can have feature parity with mcelog(8)?

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ