lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0d10606-4472-4cde-b55d-34180efad42b@amd.com>
Date: Thu, 25 Apr 2024 10:12:44 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>, robert.richter@....com
Cc: yazen.ghannam@....com, linux-edac@...r.kernel.org,
 linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
 Avadhut.Naik@....com, John.Allen@....com
Subject: Re: [PATCH v2 07/16] x86/mce/amd: Simplify DFR handler setup

On 4/24/2024 3:06 PM, Borislav Petkov wrote:
> On Thu, Apr 04, 2024 at 10:13:50AM -0500, Yazen Ghannam wrote:
>> AMD systems with the SUCCOR feature can send an APIC LVT interrupt for
>> deferred errors. The LVT offset is 0x2 by convention, i.e. this is the
>> default as listed in hardware documentation.
>>
>> However, the MCA registers may list a different LVT offset for this
>> interrupt. The kernel should honor the value from the hardware.
> 
> There's this "may" thing again.
> 

Right, I should say "the microarchitecture allows it". :)

> Is this enablement for some future hw too or do you really trust the
> value in MSR_CU_DEF_ERR is programmed correctly in all cases?
> 

I trust the value from hardware.

The intention here is to simplify the code for general maintenance and to make
later patches easier.

>> Simplify the enable flow by using the hardware-provided value. Any
>> conflicts will be caught by setup_APIC_eilvt(). Conflicts on production
>> systems can be handled as quirks, if needed.
> 
> Well, which systems support succor?
> 
> I'd like to test this on them before we face all the quirkery. :)
> 

All Zen/SMCA systems. I don't recall any issues in this area.

Some later Family 15h systems (Carrizo?) had it. But I don't know if it was
used in production. It was slightly before my time.

> That area has been plagued by hw snafus if you look at
> setup_APIC_eilvt() and talk to uncle Robert. :-P
>

Right, I found this:
27afdf2008da ("apic, x86: Use BIOS settings for IBS and MCE threshold
interrupt LVT offsets")

Which is basically the same idea: use what is in the register.

But it looks there was an issue with IBS on Family 10h.

Is this the only case of a real issue? If so, then why apply this method to
the THR and DFR interrupts?

Robert, were there any other issues?

>> @@ -595,17 +584,15 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>>  	if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
>>  		return;
>>  
>> +	/*
>> +	 * Trust the value from hardware.
>> +	 * If there's a conflict, then setup_APIC_eilvt() will throw an error.
>> +	 */
>>  	def_new = (low & MASK_DEF_LVTOFF) >> 4;
>> -	if (!(low & MASK_DEF_LVTOFF)) {
>> -		pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
>> -		def_new = DEF_LVT_OFF;
>> -		low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
>> -	}
>> +	if (setup_APIC_eilvt(def_new, DEFERRED_ERROR_VECTOR, APIC_EILVT_MSG_FIX, 0))
>> +		return;
>>  
>> -	def_offset = setup_APIC_deferred_error(def_offset, def_new);
>> -	if ((def_offset == def_new) &&
>> -	    (deferred_error_int_vector != amd_deferred_error_interrupt))
>> -		deferred_error_int_vector = amd_deferred_error_interrupt;
> 
> There was a reason for that - deferred_error_int_vector is a global var
> and you're calling enable_deferred_error_interrupt() on each CPU.
>

Right, and all CPUs should use the same APIC LVT offset. If they differ, then
setup_APIC_eilvt() will fail above and return.

Why check "if X != Y, then X = Y"? Why not just unconditionally do "X = Y"?

>> +	deferred_error_int_vector = amd_deferred_error_interrupt;
>>  
>>  	if (!mce_flags.smca)
>>  		low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
> 
> Thx.
> 

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ