lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXwabS2UNeZ65rFjMQbKxg+e3zhYiWv_bQkYad5BfPO1LWCMA@mail.gmail.com>
Date:   Wed, 28 Jun 2017 12:44:17 -0500
From:   Jack Miller <jack@...ezen.org>
To:     Borislav Petkov <bp@...e.de>
Cc:     Jack Miller <jack@...ezen.org>, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, Yazen.Ghannam@....com, x86@...nel.org
Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 !=
 thread 0

On Wed, Jun 28, 2017 at 4:22 AM, Borislav Petkov <bp@...e.de> wrote:
> On Tue, Jun 27, 2017 at 07:06:30PM -0500, Jack Miller wrote:
>> After a call to firmware SwitchBSP(),
>
> What is that and who does that?

SwitchBSP() is part of the UEFI MPServices Protocol which I believe is
an extension but it is supported by all of the firmwares I've tested
on.

In this case, I'm using a bootloader to SwitchBSP() so that hardware
thread 0 (and thus core 0) can be offlined on AMD hardware
(cpu0_hotplug unsupported). This is currently working by passing
'nomce' to the kernel, but obviously I'd prefer not to disable it.

>
>> Linux can be booted with a thread
>> that isn't the first in the system. That thread automatically becomes
>> CPU 0.
>
> Btw, you should be seeing other explosions too as a lot of code assumes
> CPU 0 is the BSP.

Actually, with 'nomce' or this patch applied the system seems to chug
along merrily, no further errors in dmesg, no further BUGs. Linux
still gets all of the topology correct (i.e. CPU 0's
core/thread/siblings are correctly identified) so really, aside from
userspace programs doing naive stuff with CPU affinity (like expecting
even,odd CPUs to be SMT pairs), I think the overall result here is
that most threads are interchangeable... except when probing certain
features like these MCA types.

>
> ...
>
>> Signed-off-by: Jack Miller <jack@...ezen.org>
>> ---
>>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
>> index 6e4a047e4b68..9d74adcf34d2 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
>> @@ -170,8 +170,8 @@ static void get_smca_bank_info(unsigned int bank)
>>       struct smca_hwid *s_hwid;
>>       u32 high, instance_id;
>>
>> -     /* Collect bank_info using CPU 0 for now. */
>> -     if (cpu)
>> +     /* Collect bank_info using hardware thread 0 for now. */
>> +     if (apic->get_apic_id(apic->read(APIC_ID)) != 0)
>
> Does
>
>         if (cpu != boot_cpu_data.cpu_index)
>                 return;
>
> work?

Unfortunately, it doesn't. That value is explicitly set to 0. Most
mechanisms operate around CPU #, which isn't very helpful if the BSP
was changed under the covers.

Alternatively, we could possibly sidestep the APIC ID uncertainty by
patching get_smca_bank_info() to fallback on reading the bank
hwid_mcatype from other online CPUs (it's already using
rdmsr_safe_on_cpu) if its own hwid_mcatype isn't valid/recognized, but
that's a more invasive patch.

Jack

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ