lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 29 Apr 2024 20:34:37 +0200
From: Robert Richter <rrichter@....com>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: Borislav Petkov <bp@...en8.de>, linux-edac@...r.kernel.org,
	linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
	Avadhut.Naik@....com, John.Allen@....com
Subject: Re: [PATCH v2 07/16] x86/mce/amd: Simplify DFR handler setup

On 25.04.24 10:12:44, Yazen Ghannam wrote:
> On 4/24/2024 3:06 PM, Borislav Petkov wrote:
> > On Thu, Apr 04, 2024 at 10:13:50AM -0500, Yazen Ghannam wrote:
> >> AMD systems with the SUCCOR feature can send an APIC LVT interrupt for
> >> deferred errors. The LVT offset is 0x2 by convention, i.e. this is the
> >> default as listed in hardware documentation.
> >>
> >> However, the MCA registers may list a different LVT offset for this
> >> interrupt. The kernel should honor the value from the hardware.
> > 
> > There's this "may" thing again.
> > 
> 
> Right, I should say "the microarchitecture allows it". :)
> 
> > Is this enablement for some future hw too or do you really trust the
> > value in MSR_CU_DEF_ERR is programmed correctly in all cases?
> > 
> 
> I trust the value from hardware.
> 
> The intention here is to simplify the code for general maintenance and to make
> later patches easier.
> 
> >> Simplify the enable flow by using the hardware-provided value. Any
> >> conflicts will be caught by setup_APIC_eilvt(). Conflicts on production
> >> systems can be handled as quirks, if needed.
> > 
> > Well, which systems support succor?
> > 
> > I'd like to test this on them before we face all the quirkery. :)
> > 
> 
> All Zen/SMCA systems. I don't recall any issues in this area.
> 
> Some later Family 15h systems (Carrizo?) had it. But I don't know if it was
> used in production. It was slightly before my time.
> 
> > That area has been plagued by hw snafus if you look at
> > setup_APIC_eilvt() and talk to uncle Robert. :-P
> >
> 
> Right, I found this:
> 27afdf2008da ("apic, x86: Use BIOS settings for IBS and MCE threshold
> interrupt LVT offsets")
> 
> Which is basically the same idea: use what is in the register.
> 
> But it looks there was an issue with IBS on Family 10h.

After looking a while into it I think the issue was the following:

IBS offset was not enabled by firmware, but MCE already was (due to
earlier setup). And mce was (maybe) not on all cpus and only one cpu
per socket enabled. The IBS vector should be enabled on all cpus. Now
firmware allocated offset 1 for mce (instead of offset 0 as for
k8). This caused the hardcoded value (offset 1 for IBS) to be already
taken. Also, hardcoded values couldn't be used at all as this would
have not been worked on k8 (for mce). Another issue was to find the
next free offset as you couldn't examine just the current cpu. So even
if the offset on the current was available, another cpu might have
that offset already in use. Yet another problem was that programmed
offsets for mce and ibs overlapped each other and the kernel had to
reassign them (the ibs offset).

I hope a remember correctly here with all details.

Thanks,

-Robert

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ