linux-kernel - Re: [PATCH 1/2] x86/mce: Extend AMD severity grading function with new types of errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5096b443-1c76-7e74-f596-e994800d7d2b@amd.com>
Date:   Thu, 31 Mar 2022 09:41:39 -0500
From:   Carlos Bilbao <carlos.bilbao@....com>
To:     Yazen Ghannam <yazen.ghannam@....com>
Cc:     bp@...en8.de, tglx@...utronix.de, mingo@...hat.com,
        dave.hansen@...ux.intel.com, x86@...nel.org,
        linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
        bilbao@...edu
Subject: Re: [PATCH 1/2] x86/mce: Extend AMD severity grading function with
 new types of errors

On 3/30/2022 3:02 PM, Yazen Ghannam wrote:
> On Mon, Mar 28, 2022 at 08:41:32AM -0500, Carlos Bilbao wrote:
>> The MCE handler needs to understand the severity of the machine errors to
>> act accordingly. In the case of AMD, very few errors are covered in the
>> grading logic.
>>
>> Extend the MCEs severity grading of AMD to cover new types of machine
>> errors.
>>
>> Signed-off-by: Carlos Bilbao <carlos.bilbao@....com>
>> ---
>>  arch/x86/include/asm/mce.h         |   6 +
>>  arch/x86/kernel/cpu/mce/severity.c | 180 +++++++++++++++++++++++------
>>  2 files changed, 150 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index cc73061e7255..6b1ef40f8580 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -50,6 +50,12 @@
>>  #define MCI_STATUS_POISON	BIT_ULL(43)  /* access poisonous data */
>>  #define MCI_STATUS_SCRUB	BIT_ULL(40)  /* Error detected during scrub operation */
>>  
>> +/* AMD Error codes from PPR(s) section 3.1 Machine Check Architecture */
>> +#define ERRORCODE_T_MSK GENMASK(3, 2) /* Mask for transaction type bits */
>> +#define ERRORCODE_M_MSK GENMASK(7, 4) /* Mask for memory transaction type */
>> +#define ERRORCODE_T_DATA  0x4  /* Transaction type of error is Data */
>> +#define ERRORCODE_M_FETCH 0x50 /* Memory transaction type of error is Instruction Fetch */
>> +
>>  /*
>>   * McaX field if set indicates a given bank supports MCA extensions:
>>   *  - Deferred error interrupt type is specifiable by bank.
>> diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
>> index 1add86935349..4a089e9dbbaf 100644
>> --- a/arch/x86/kernel/cpu/mce/severity.c
>> +++ b/arch/x86/kernel/cpu/mce/severity.c
>> @@ -327,59 +327,167 @@ static __always_inline int mce_severity_amd_smca(struct mce *m, enum context err
>>  }
>>  
>>  /*
>> - * See AMD Error Scope Hierarchy table in a newer BKDG. For example
>> - * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
>> + * Evaluate the severity of an overflow error for AMD systems, dependent on
>> + * the recoverity features available.
>>   */
>> -static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **msg, bool is_excp)
>> +static noinstr int mce_grade_overflow_amd(struct mce *m, enum context ctx)
>>  {
>> -	enum context ctx = error_context(m, regs);
>> +	int ret;
>>  
>> -	/* Processor Context Corrupt, no need to fumble too much, die! */
>> -	if (m->status & MCI_STATUS_PCC)
>> +	/*
>> +	 * On older systems where overflow_recov flag is not present, we
>> +	 * should simply panic if an error overflow occurs. If
>> +	 * overflow_recov flag is present and set, then software can try
>> +	 * to at least kill process to prolong system operation.
>> +	 */
>> +	if (mce_flags.overflow_recov) {
>> +		if (mce_flags.smca) {
>> +			ret = mce_severity_amd_smca(m, ctx);
>> +		} else {
>> +			/* kill current process */
>> +			ret = MCE_AR_SEVERITY;
>> +		}
>> +		return ret;
>> +	}
>> +
>> +	/* at least one error was not logged */
>> +	if (m->status & MCI_STATUS_OVER)
>>  		return MCE_PANIC_SEVERITY;
>>  
>> -	if (m->status & MCI_STATUS_UC) {
>> +	/*
>> +	 * For any other case, return MCE_UC_SEVERITY so that we log the
>> +	 * error and exit #MC handler.
>> +	 */
>> +	return MCE_UC_SEVERITY;
>> +}
>>  
>> -		if (ctx == IN_KERNEL)
>> -			return MCE_PANIC_SEVERITY;
>> +/*
>> + * See AMD PPR(s) section 3.1 Machine Check Architecture
>> + */
>> +static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **msg, bool is_excp)
>> +{
>> +	enum context ctx = error_context(m, regs);
>> +	int ret;
>>  
>> -		/*
>> -		 * On older systems where overflow_recov flag is not present, we
>> -		 * should simply panic if an error overflow occurs. If
>> -		 * overflow_recov flag is present and set, then software can try
>> -		 * to at least kill process to prolong system operation.
>> -		 */
>> -		if (mce_flags.overflow_recov) {
>> -			if (mce_flags.smca)
>> -				return mce_severity_amd_smca(m, ctx);
>> +	/*
>> +	 * Default return values. The poll handler catches these and passes
>> +	 * responsibility of decoding them to EDAC
>> +	 */
>> +	ret = MCE_KEEP_SEVERITY;
>>  
>> -			/* kill current process */
>> -			return MCE_AR_SEVERITY;
>> -		} else {
>> -			/* at least one error was not logged */
>> -			if (m->status & MCI_STATUS_OVER)
>> -				return MCE_PANIC_SEVERITY;
>> +	/*
>> +	 * Evaluate the severity of deferred errors for AMD systems, for which only
>> +	 * scrub error is interesting to notify an action requirement.
>> +	 */
>> +	if (m->status & MCI_STATUS_DEFERRED) {
>> +		if (m->status & MCI_STATUS_SCRUB)
> 
> Multi-line if-else statements should have braces. This is true even for
> conditions with a single line. This is a coding style point.
> 
> In other words, if all conditions are a single line, then no braces needed.
> But if any of the conditions has multiple lines, then all conditions need
> braces.
> 
>> +			ret = MCE_AR_SEVERITY;
> 
> This is wrong. Action required is for errors that need to be handled
> immediately and where recovery is possible, like a poison consumption
> machine check exception.
> 
> Also, the SCRUB indicator does not contribute any new or interesting
> information in terms of OS actions.
> 
>> +		else {
>> +			/*
>> +			 * deferred error: poll handler catches these and adds to mce_ring so
>> +			 * memory-failure can take recovery actions.
>> +			 */
>> +			ret = MCE_DEFERRED_SEVERITY;
>>  		}
>> +		goto amd_severity;
>> +	}
>>  
>> -		/*
>> -		 * For any other case, return MCE_UC_SEVERITY so that we log the
>> -		 * error and exit #MC handler.
>> -		 */
>> -		return MCE_UC_SEVERITY;
>> +	/* If the UC bit is not set, the error has been corrected */
>> +	if (!(m->status & MCI_STATUS_UC)) {
>> +		ret = MCE_KEEP_SEVERITY;
>> +		goto amd_severity;
>>  	}
>>  
>>  	/*
>> -	 * deferred error: poll handler catches these and adds to mce_ring so
>> -	 * memory-failure can take recovery actions.
>> +	 * Evaluate the severity of memory poison for AMD systems,
>> +	 * depending on the context in which the MCE happened.
>>  	 */
>> -	if (m->status & MCI_STATUS_DEFERRED)
>> -		return MCE_DEFERRED_SEVERITY;
>> +	if (m->status & MCI_STATUS_POISON) {
>> +		switch (ctx) {
>> +		case IN_USER:
>> +			ret = MCE_AR_SEVERITY;
>> +			break;
>> +		case IN_KERNEL_RECOV:
>> +#ifdef CONFIG_MEMORY_FAILURE
>> +			ret = MCE_AR_SEVERITY;
>> +#else
>> +			ret = MCE_PANIC_SEVERITY;
>> +#endif
>> +			break;
>> +		case IN_KERNEL:
>> +			ret = MCE_PANIC_SEVERITY;
>> +			break;
> 
> Is this case needed if it's identical to the default?
> 
>> +		default:
>> +			ret = MCE_PANIC_SEVERITY;
>> +		}
>> +
>> +		goto amd_severity;
>> +	}
>> +
>> +	/* Processor Context Corrupt, no need to fumble too much, die! */
>> +	if (m->status & MCI_STATUS_PCC) {
>> +		ret = MCE_PANIC_SEVERITY;
>> +		goto amd_severity;
>> +	}
> 
> This should be the first condition checked.
> 
>>  
>>  	/*
>> -	 * corrected error: poll handler catches these and passes responsibility
>> -	 * of decoding the error to EDAC
>> +	 * Evaluate the severity of data load error for AMD systems,
>> +	 * depending on the context in which the MCE happened.
>>  	 */
>> -	return MCE_KEEP_SEVERITY;
>> +	if ((m->status & ERRORCODE_T_MSK) == ERRORCODE_T_DATA) {
> 
> This information is not useful for determining severity.
> 
>> +		switch (ctx) {
>> +		case IN_USER:
>> +		case IN_KERNEL_RECOV:
>> +			ret = MCE_AR_SEVERITY;
>> +			break;
>> +		case IN_KERNEL:
>> +			ret = MCE_PANIC_SEVERITY;
>> +			break;
>> +		default:
>> +			ret = MCE_PANIC_SEVERITY;
>> +		}
>> +
>> +		goto amd_severity;
>> +	}
>> +
>> +
>> +	/*
>> +	 * Evaluate the severity of an instruction fetch error for AMD systems,
>> +	 * depending on the context in which the MCE happened.
>> +	 */
>> +	if ((m->status & ERRORCODE_M_MSK) == ERRORCODE_M_FETCH) {
> 
> This information is not useful for determining severity.
> 
>> +		switch (ctx) {
>> +		case IN_USER:
>> +			ret = MCE_AR_SEVERITY;
>> +			break;
>> +		case IN_KERNEL_RECOV:
>> +#ifdef CONFIG_MEMORY_FAILURE
>> +			ret = MCE_AR_SEVERITY;
>> +#else
>> +			ret = MCE_PANIC_SEVERITY;
>> +#endif
>> +			break;
>> +		case IN_KERNEL:
>> +			ret = MCE_PANIC_SEVERITY;
>> +			break;
>> +		default:
>> +			ret = MCE_PANIC_SEVERITY;
>> +		}
>> +
>> +		goto amd_severity;
>> +	}
>> +
>> +	if (m->status & MCI_STATUS_OVER) {
>> +		ret = mce_grade_overflow_amd(m, ctx);
>> +		goto amd_severity;
>> +	}
> 
> This condition should be checked earlier.
> 
>> +
>> +	if (ctx == IN_KERNEL)
>> +		ret = MCE_PANIC_SEVERITY;
> 
> This condition should be checked earlier. This would avoid the redundant
> checks above.
> 
> 
> Overall, the severity code should be audited and simplified first. Existing
> cases should then be updated with useful information like messages. Finally,
> new cases can be added if and only if they are unique, i.e. they're not
> already covered by existing cases.
> 
> The current code already follows the hardware documentation for the most part.
> Here's an example of how it can be simplified.
> 
> Pseudocode:
> 
> 	if "PCC bit set"
> 		PANIC
> 
> 	if "Deferred bit set"
> 		DEFERRED
> 
> 	if "UC bit not set"
> 		KEEP
> 
> 	if "Overflow recovery not supported" and "Overflow bit set"
> 		PANIC
> 
> 	if "MCA recovery (SUCCOR) not supported"
> 		PANIC
> 
> 	if "In kernel context"
> 		PANIC
> 
> 	else
> 		AR
> 
> 
> Notes:
> 
> 1) The Deferred and !UC cases are moved up compared to documentation. It
> looks like you had a similar idea. This saves an indentation level since all
> the following cases are for uncorrectable errors.
> 
> 2) The TCC check is not included. This is because TCC has a similar function
> to RIPV|EIPV *. And the only use is in conjunction with "kernel context". So
> this is redundant as "UC in kernel context" is a PANIC.
> 
> 	* A TCC check can be done as an alternative to mc_recoverable(). But
> 	  let's not worry about that now.
> 
> 3) There is no SMCA check. This is because SMCA is used only to determine if
> the TCC bit *may* be defined.
> 
> So please do the code simplification in a first patch. Then add messages for
> those existing cases in a second patch.
> 

Thanks Yazen, this approach definitely simplifies everything. I'll send a 
v2 of the patchset using this pseudocode.

> Thanks,
> Yazen

Cheers,
Carlos