[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250302073711.GBZ8QKp1QstGaVGqBR@fat_crate.local>
Date: Sun, 2 Mar 2025 08:37:11 +0100
From: Borislav Petkov <bp@...en8.de>
To: Shuai Xue <xueshuai@...ux.alibaba.com>
Cc: "Luck, Tony" <tony.luck@...el.com>, nao.horiguchi@...il.com,
tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, linmiaohe@...wei.com,
akpm@...ux-foundation.org, peterz@...radead.org,
jpoimboe@...nel.org, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
baolin.wang@...ux.alibaba.com, tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v2 2/5] x86/mce: dump error msg from severities
On Sun, Mar 02, 2025 at 03:14:52PM +0800, Shuai Xue wrote:
> > > "mce: Uncorrected hardware memory error in user-access at 3b116c400"
>
> It is the current message in kill_me_maybe(), not added by me.
Doesn't change the fact that it is not really helpful when it comes to logging
all errors properly.
[ Properly means using a structured log format with the tracepoint and not
dumping it into dmesg. ]
And figuring out what hw is failing so that it can be replaced. No one has
come with a real need for making it better, more useful.
You're coming with what I think is such a need and I'm trying to explain to
you what needs to be done. But you want to feed your AI with dmesg and solve
it this way.
If you wanna do it right, we can talk. Otherwise, have fun.
> 3. We need to identify and implement potential improvements.
>
> "mce: Uncorrected hardware memory error in user-access at 3b116c400"
>
> is *nothing* but
>
> "mce: Action required: data load in error recoverable area of kernel"
>
> helps.
I don't think you've read what I wrote but that's ok. If you think it helps,
you can keep it in your kernels.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists