lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Apr 2024 21:06:58 +0200
From: Borislav Petkov <bp@...en8.de>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
	tony.luck@...el.com, x86@...nel.org, Avadhut.Naik@....com,
	John.Allen@....com
Subject: Re: [PATCH v2 07/16] x86/mce/amd: Simplify DFR handler setup

On Thu, Apr 04, 2024 at 10:13:50AM -0500, Yazen Ghannam wrote:
> AMD systems with the SUCCOR feature can send an APIC LVT interrupt for
> deferred errors. The LVT offset is 0x2 by convention, i.e. this is the
> default as listed in hardware documentation.
> 
> However, the MCA registers may list a different LVT offset for this
> interrupt. The kernel should honor the value from the hardware.

There's this "may" thing again.

Is this enablement for some future hw too or do you really trust the
value in MSR_CU_DEF_ERR is programmed correctly in all cases?

> Simplify the enable flow by using the hardware-provided value. Any
> conflicts will be caught by setup_APIC_eilvt(). Conflicts on production
> systems can be handled as quirks, if needed.

Well, which systems support succor?

I'd like to test this on them before we face all the quirkery. :)

That area has been plagued by hw snafus if you look at
setup_APIC_eilvt() and talk to uncle Robert. :-P

> @@ -595,17 +584,15 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>  	if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
>  		return;
>  
> +	/*
> +	 * Trust the value from hardware.
> +	 * If there's a conflict, then setup_APIC_eilvt() will throw an error.
> +	 */
>  	def_new = (low & MASK_DEF_LVTOFF) >> 4;
> -	if (!(low & MASK_DEF_LVTOFF)) {
> -		pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
> -		def_new = DEF_LVT_OFF;
> -		low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
> -	}
> +	if (setup_APIC_eilvt(def_new, DEFERRED_ERROR_VECTOR, APIC_EILVT_MSG_FIX, 0))
> +		return;
>  
> -	def_offset = setup_APIC_deferred_error(def_offset, def_new);
> -	if ((def_offset == def_new) &&
> -	    (deferred_error_int_vector != amd_deferred_error_interrupt))
> -		deferred_error_int_vector = amd_deferred_error_interrupt;

There was a reason for that - deferred_error_int_vector is a global var
and you're calling enable_deferred_error_interrupt() on each CPU.

> +	deferred_error_int_vector = amd_deferred_error_interrupt;
>  
>  	if (!mce_flags.smca)
>  		low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ