lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250302073711.GBZ8QKp1QstGaVGqBR@fat_crate.local>
Date: Sun, 2 Mar 2025 08:37:11 +0100
From: Borislav Petkov <bp@...en8.de>
To: Shuai Xue <xueshuai@...ux.alibaba.com>
Cc: "Luck, Tony" <tony.luck@...el.com>, nao.horiguchi@...il.com,
	tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
	x86@...nel.org, hpa@...or.com, linmiaohe@...wei.com,
	akpm@...ux-foundation.org, peterz@...radead.org,
	jpoimboe@...nel.org, linux-edac@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	baolin.wang@...ux.alibaba.com, tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v2 2/5] x86/mce: dump error msg from severities

On Sun, Mar 02, 2025 at 03:14:52PM +0800, Shuai Xue wrote:
> > >      "mce: Uncorrected hardware memory error in user-access at 3b116c400"
> 
> It is the current message in kill_me_maybe(), not added by me.

Doesn't change the fact that it is not really helpful when it comes to logging
all errors properly.

  [ Properly means using a structured log format with the tracepoint and not
    dumping it into dmesg. ]

And figuring out what hw is failing so that it can be replaced. No one has
come with a real need for making it better, more useful.

You're coming with what I think is such a need and I'm trying to explain to
you what needs to be done. But you want to feed your AI with dmesg and solve
it this way.

If you wanna do it right, we can talk. Otherwise, have fun.

> 3. We need to identify and implement potential improvements.
> 
> "mce: Uncorrected hardware memory error in user-access at 3b116c400"
> 
> is *nothing* but
> 
> "mce: Action required: data load in error recoverable area of kernel"
> 
> helps.

I don't think you've read what I wrote but that's ok. If you think it helps,
you can keep it in your kernels.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ