[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5096b443-1c76-7e74-f596-e994800d7d2b@amd.com>
Date: Thu, 31 Mar 2022 09:41:39 -0500
From: Carlos Bilbao <carlos.bilbao@....com>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: bp@...en8.de, tglx@...utronix.de, mingo@...hat.com,
dave.hansen@...ux.intel.com, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
bilbao@...edu
Subject: Re: [PATCH 1/2] x86/mce: Extend AMD severity grading function with
new types of errors
On 3/30/2022 3:02 PM, Yazen Ghannam wrote:
> On Mon, Mar 28, 2022 at 08:41:32AM -0500, Carlos Bilbao wrote:
>> The MCE handler needs to understand the severity of the machine errors to
>> act accordingly. In the case of AMD, very few errors are covered in the
>> grading logic.
>>
>> Extend the MCEs severity grading of AMD to cover new types of machine
>> errors.
>>
>> Signed-off-by: Carlos Bilbao <carlos.bilbao@....com>
>> ---
>> arch/x86/include/asm/mce.h | 6 +
>> arch/x86/kernel/cpu/mce/severity.c | 180 +++++++++++++++++++++++------
>> 2 files changed, 150 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index cc73061e7255..6b1ef40f8580 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -50,6 +50,12 @@
>> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
>> #define MCI_STATUS_SCRUB BIT_ULL(40) /* Error detected during scrub operation */
>>
>> +/* AMD Error codes from PPR(s) section 3.1 Machine Check Architecture */
>> +#define ERRORCODE_T_MSK GENMASK(3, 2) /* Mask for transaction type bits */
>> +#define ERRORCODE_M_MSK GENMASK(7, 4) /* Mask for memory transaction type */
>> +#define ERRORCODE_T_DATA 0x4 /* Transaction type of error is Data */
>> +#define ERRORCODE_M_FETCH 0x50 /* Memory transaction type of error is Instruction Fetch */
>> +
>> /*
>> * McaX field if set indicates a given bank supports MCA extensions:
>> * - Deferred error interrupt type is specifiable by bank.
>> diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
>> index 1add86935349..4a089e9dbbaf 100644
>> --- a/arch/x86/kernel/cpu/mce/severity.c
>> +++ b/arch/x86/kernel/cpu/mce/severity.c
>> @@ -327,59 +327,167 @@ static __always_inline int mce_severity_amd_smca(struct mce *m, enum context err
>> }
>>
>> /*
>> - * See AMD Error Scope Hierarchy table in a newer BKDG. For example
>> - * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
>> + * Evaluate the severity of an overflow error for AMD systems, dependent on
>> + * the recoverity features available.
>> */
>> -static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **msg, bool is_excp)
>> +static noinstr int mce_grade_overflow_amd(struct mce *m, enum context ctx)
>> {
>> - enum context ctx = error_context(m, regs);
>> + int ret;
>>
>> - /* Processor Context Corrupt, no need to fumble too much, die! */
>> - if (m->status & MCI_STATUS_PCC)
>> + /*
>> + * On older systems where overflow_recov flag is not present, we
>> + * should simply panic if an error overflow occurs. If
>> + * overflow_recov flag is present and set, then software can try
>> + * to at least kill process to prolong system operation.
>> + */
>> + if (mce_flags.overflow_recov) {
>> + if (mce_flags.smca) {
>> + ret = mce_severity_amd_smca(m, ctx);
>> + } else {
>> + /* kill current process */
>> + ret = MCE_AR_SEVERITY;
>> + }
>> + return ret;
>> + }
>> +
>> + /* at least one error was not logged */
>> + if (m->status & MCI_STATUS_OVER)
>> return MCE_PANIC_SEVERITY;
>>
>> - if (m->status & MCI_STATUS_UC) {
>> + /*
>> + * For any other case, return MCE_UC_SEVERITY so that we log the
>> + * error and exit #MC handler.
>> + */
>> + return MCE_UC_SEVERITY;
>> +}
>>
>> - if (ctx == IN_KERNEL)
>> - return MCE_PANIC_SEVERITY;
>> +/*
>> + * See AMD PPR(s) section 3.1 Machine Check Architecture
>> + */
>> +static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **msg, bool is_excp)
>> +{
>> + enum context ctx = error_context(m, regs);
>> + int ret;
>>
>> - /*
>> - * On older systems where overflow_recov flag is not present, we
>> - * should simply panic if an error overflow occurs. If
>> - * overflow_recov flag is present and set, then software can try
>> - * to at least kill process to prolong system operation.
>> - */
>> - if (mce_flags.overflow_recov) {
>> - if (mce_flags.smca)
>> - return mce_severity_amd_smca(m, ctx);
>> + /*
>> + * Default return values. The poll handler catches these and passes
>> + * responsibility of decoding them to EDAC
>> + */
>> + ret = MCE_KEEP_SEVERITY;
>>
>> - /* kill current process */
>> - return MCE_AR_SEVERITY;
>> - } else {
>> - /* at least one error was not logged */
>> - if (m->status & MCI_STATUS_OVER)
>> - return MCE_PANIC_SEVERITY;
>> + /*
>> + * Evaluate the severity of deferred errors for AMD systems, for which only
>> + * scrub error is interesting to notify an action requirement.
>> + */
>> + if (m->status & MCI_STATUS_DEFERRED) {
>> + if (m->status & MCI_STATUS_SCRUB)
>
> Multi-line if-else statements should have braces. This is true even for
> conditions with a single line. This is a coding style point.
>
> In other words, if all conditions are a single line, then no braces needed.
> But if any of the conditions has multiple lines, then all conditions need
> braces.
>
>> + ret = MCE_AR_SEVERITY;
>
> This is wrong. Action required is for errors that need to be handled
> immediately and where recovery is possible, like a poison consumption
> machine check exception.
>
> Also, the SCRUB indicator does not contribute any new or interesting
> information in terms of OS actions.
>
>> + else {
>> + /*
>> + * deferred error: poll handler catches these and adds to mce_ring so
>> + * memory-failure can take recovery actions.
>> + */
>> + ret = MCE_DEFERRED_SEVERITY;
>> }
>> + goto amd_severity;
>> + }
>>
>> - /*
>> - * For any other case, return MCE_UC_SEVERITY so that we log the
>> - * error and exit #MC handler.
>> - */
>> - return MCE_UC_SEVERITY;
>> + /* If the UC bit is not set, the error has been corrected */
>> + if (!(m->status & MCI_STATUS_UC)) {
>> + ret = MCE_KEEP_SEVERITY;
>> + goto amd_severity;
>> }
>>
>> /*
>> - * deferred error: poll handler catches these and adds to mce_ring so
>> - * memory-failure can take recovery actions.
>> + * Evaluate the severity of memory poison for AMD systems,
>> + * depending on the context in which the MCE happened.
>> */
>> - if (m->status & MCI_STATUS_DEFERRED)
>> - return MCE_DEFERRED_SEVERITY;
>> + if (m->status & MCI_STATUS_POISON) {
>> + switch (ctx) {
>> + case IN_USER:
>> + ret = MCE_AR_SEVERITY;
>> + break;
>> + case IN_KERNEL_RECOV:
>> +#ifdef CONFIG_MEMORY_FAILURE
>> + ret = MCE_AR_SEVERITY;
>> +#else
>> + ret = MCE_PANIC_SEVERITY;
>> +#endif
>> + break;
>> + case IN_KERNEL:
>> + ret = MCE_PANIC_SEVERITY;
>> + break;
>
> Is this case needed if it's identical to the default?
>
>> + default:
>> + ret = MCE_PANIC_SEVERITY;
>> + }
>> +
>> + goto amd_severity;
>> + }
>> +
>> + /* Processor Context Corrupt, no need to fumble too much, die! */
>> + if (m->status & MCI_STATUS_PCC) {
>> + ret = MCE_PANIC_SEVERITY;
>> + goto amd_severity;
>> + }
>
> This should be the first condition checked.
>
>>
>> /*
>> - * corrected error: poll handler catches these and passes responsibility
>> - * of decoding the error to EDAC
>> + * Evaluate the severity of data load error for AMD systems,
>> + * depending on the context in which the MCE happened.
>> */
>> - return MCE_KEEP_SEVERITY;
>> + if ((m->status & ERRORCODE_T_MSK) == ERRORCODE_T_DATA) {
>
> This information is not useful for determining severity.
>
>> + switch (ctx) {
>> + case IN_USER:
>> + case IN_KERNEL_RECOV:
>> + ret = MCE_AR_SEVERITY;
>> + break;
>> + case IN_KERNEL:
>> + ret = MCE_PANIC_SEVERITY;
>> + break;
>> + default:
>> + ret = MCE_PANIC_SEVERITY;
>> + }
>> +
>> + goto amd_severity;
>> + }
>> +
>> +
>> + /*
>> + * Evaluate the severity of an instruction fetch error for AMD systems,
>> + * depending on the context in which the MCE happened.
>> + */
>> + if ((m->status & ERRORCODE_M_MSK) == ERRORCODE_M_FETCH) {
>
> This information is not useful for determining severity.
>
>> + switch (ctx) {
>> + case IN_USER:
>> + ret = MCE_AR_SEVERITY;
>> + break;
>> + case IN_KERNEL_RECOV:
>> +#ifdef CONFIG_MEMORY_FAILURE
>> + ret = MCE_AR_SEVERITY;
>> +#else
>> + ret = MCE_PANIC_SEVERITY;
>> +#endif
>> + break;
>> + case IN_KERNEL:
>> + ret = MCE_PANIC_SEVERITY;
>> + break;
>> + default:
>> + ret = MCE_PANIC_SEVERITY;
>> + }
>> +
>> + goto amd_severity;
>> + }
>> +
>> + if (m->status & MCI_STATUS_OVER) {
>> + ret = mce_grade_overflow_amd(m, ctx);
>> + goto amd_severity;
>> + }
>
> This condition should be checked earlier.
>
>> +
>> + if (ctx == IN_KERNEL)
>> + ret = MCE_PANIC_SEVERITY;
>
> This condition should be checked earlier. This would avoid the redundant
> checks above.
>
>
> Overall, the severity code should be audited and simplified first. Existing
> cases should then be updated with useful information like messages. Finally,
> new cases can be added if and only if they are unique, i.e. they're not
> already covered by existing cases.
>
> The current code already follows the hardware documentation for the most part.
> Here's an example of how it can be simplified.
>
> Pseudocode:
>
> if "PCC bit set"
> PANIC
>
> if "Deferred bit set"
> DEFERRED
>
> if "UC bit not set"
> KEEP
>
> if "Overflow recovery not supported" and "Overflow bit set"
> PANIC
>
> if "MCA recovery (SUCCOR) not supported"
> PANIC
>
> if "In kernel context"
> PANIC
>
> else
> AR
>
>
> Notes:
>
> 1) The Deferred and !UC cases are moved up compared to documentation. It
> looks like you had a similar idea. This saves an indentation level since all
> the following cases are for uncorrectable errors.
>
> 2) The TCC check is not included. This is because TCC has a similar function
> to RIPV|EIPV *. And the only use is in conjunction with "kernel context". So
> this is redundant as "UC in kernel context" is a PANIC.
>
> * A TCC check can be done as an alternative to mc_recoverable(). But
> let's not worry about that now.
>
> 3) There is no SMCA check. This is because SMCA is used only to determine if
> the TCC bit *may* be defined.
>
> So please do the code simplification in a first patch. Then add messages for
> those existing cases in a second patch.
>
Thanks Yazen, this approach definitely simplifies everything. I'll send a
v2 of the patchset using this pseudocode.
> Thanks,
> Yazen
Cheers,
Carlos
Powered by blists - more mailing lists