lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Jun 2017 20:16:34 +0200
From:   Borislav Petkov <bp@...e.de>
To:     Jack Miller <jack@...ezen.org>, Yazen.Ghannam@....com
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de, x86@...nel.org
Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 !=
 thread 0

On Wed, Jun 28, 2017 at 12:44:17PM -0500, Jack Miller wrote:
> SwitchBSP() is part of the UEFI MPServices Protocol which I believe is
> an extension but it is supported by all of the firmwares I've tested
> on.

Damn, that ubiquitous firmware. One day the kernel will be just a
userspace process to the fw.

> In this case, I'm using a bootloader to SwitchBSP() so that hardware
> thread 0 (and thus core 0) can be offlined on AMD hardware
> (cpu0_hotplug unsupported).

Why unsupported?

I remember doing some quick experiments with booting with "cpu0_hotplug"
and being able to offline the BSP. It was a long time ago though.

> This is currently working by passing 'nomce' to the kernel, but
> obviously I'd prefer not to disable it.

Right, nomce is not an optimal setting.

> Actually, with 'nomce' or this patch applied the system seems to chug
> along merrily, no further errors in dmesg, no further BUGs. Linux
> still gets all of the topology correct (i.e. CPU 0's
> core/thread/siblings are correctly identified) so really, aside from
> userspace programs doing naive stuff with CPU affinity (like expecting
> even,odd CPUs to be SMT pairs), I think the overall result here is
> that most threads are interchangeable... except when probing certain
> features like these MCA types.

May I ask what your goal is? Or is it sekrit stuff? physical hotplug
maybe?

> Unfortunately, it doesn't. That value is explicitly set to 0.

Yeah, I see it in smp_store_boot_cpu_info().

So if we had to be really correct, that code there should set the
*actual* CPU index of the BSP and not simply write a 0. It's that

	BSP index == 0

assumption I've been talking about.

> Most mechanisms operate around CPU #, which isn't very helpful if the
> BSP was changed under the covers.
>
> Alternatively, we could possibly sidestep the APIC ID uncertainty by
> patching get_smca_bank_info() to fallback on reading the bank
> hwid_mcatype from other online CPUs (it's already using
> rdmsr_safe_on_cpu) if its own hwid_mcatype isn't valid/recognized, but
> that's a more invasive patch.

Yeah, I think there is some distinction whether you read the MSRs on the
BSP and on the other threads. Yazen did that in

  5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types")

Yazen, why CPU 0? Can we get rid of that check there?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ