[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240626120429.GQZnwDzQ47y1fOlFTp@fat_crate.local>
Date: Wed, 26 Jun 2024 14:04:29 +0200
From: Borislav Petkov <bp@...en8.de>
To: Avadhut Naik <avadhut.naik@....com>
Cc: x86@...nel.org, linux-edac@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-kernel@...r.kernel.org, tony.luck@...el.com,
rafael@...nel.org, tglx@...utronix.de, mingo@...hat.com,
rostedt@...dmis.org, lenb@...nel.org, mchehab@...nel.org,
james.morse@....com, airlied@...il.com, yazen.ghannam@....com,
john.allen@....com, avadnaik@....com
Subject: Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
On Tue, Jun 25, 2024 at 02:56:24PM -0500, Avadhut Naik wrote:
> From: Yazen Ghannam <yazen.ghannam@....com>
>
> A new "FRU Text in MCA" feature is defined where the Field Replaceable
> Unit (FRU) Text for a device is represented by a string in the new
> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>
> The FRU Text is populated dynamically for each individual error state
> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> covers multiple devices, for example, a Unified Memory Controller (UMC)
> bank that manages two DIMMs.
>
>From here...
> Print the FRU Text string, if available, when decoding an MCA error.
>
> Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific
> error information and save the value of the MSR. The very value can then be
> exported through tracepoint for userspace tools like rasdaemon to print FRU
> Text, if available.
>
> Note: Checkpatch checks/warnings are ignored to maintain coding style.
... to here goes into the trash can except what MCA_CONFIG is for being logged
as part of the error.
> [Yazen: Add Avadhut as co-developer for wrapper changes. ]
>
> Co-developed-by: Avadhut Naik <avadhut.naik@....com>
> Signed-off-by: Avadhut Naik <avadhut.naik@....com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
Ditto as for patch 3.
> ---
> @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>
> if (m->status & MCI_STATUS_SYNDV) {
> pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> - err->vi.amd.synd1, err->vi.amd.synd2);
> + if (mca_config & MCI_CONFIG_FRUTEXT) {
> + char frutext[17];
> +
> + memset(frutext, 0, sizeof(frutext));
Why are you clearing it if you're overwriting it immediately?
> + memcpy(&frutext[0], &err->vi.amd.synd1, 8);
> + memcpy(&frutext[8], &err->vi.amd.synd2, 8);
> +
> + pr_emerg(HW_ERR "FRU Text: %s", frutext);
> + } else {
> + pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> + err->vi.amd.synd1, err->vi.amd.synd2);
> + }
> }
>
> pr_cont("\n");
> --
> 2.34.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists