lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Jan 2024 20:45:22 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Avadhut Naik <avadhut.naik@....com>,
	"linux-trace-kernel@...r.kernel.org" <linux-trace-kernel@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"yazen.ghannam@....com" <yazen.ghannam@....com>,
	"avadnaik@....com" <avadnaik@....com>
Subject: Re: [PATCH v2 0/2] Update mce_record tracepoint

On Fri, Jan 26, 2024 at 07:15:50PM +0000, Luck, Tony wrote:
> If deployment of a microcode update across a fleet always went
> flawlessly, life would be simpler. But things can fail. And maybe the
> failure wasn't noticed. Perhaps a node was rebooting when the sysadmin
> pushed the update to the fleet and so missed the deployment. Perhaps
> one core was already acting weird and the microcode update didn't get
> applied to that core.

Yes, and you go collect data from that box. You will have to anyway to
figure out why the microcode didn't update.

> Swapping a hard drive, or hot plugging a NIC isn't very likely
> to correlate with an error reported by the CPU in machine
> check banks.

Ofc it is - coherent probe timeoutting due to problematic insertion
could be reported with a MCE, and so on and so on.

> Is it so very different to add this to a trace record so that rasdaemon
> can have feature parity with mcelog(8)?

I knew you were gonna say that. When someone decides that it is
a splendid idea to add more stuff to struct mce then said someone would
want it in the tracepoint too.

And then we're back to my original question: 

"And where does it end? Stick full dmesg in the tracepoint too?"

Where do you draw the line in the sand and say, no more, especially
static, fields bloating the trace record should be added and from then
on, you should go collect the info from that box. Something which you're
supposed to do anyway.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ