linux-kernel - Re: [PATCH 13/13] x86/microcode/AMD: Remove AP scanning optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1701172219570.3645@nanos>
Date:   Tue, 17 Jan 2017 22:24:50 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Borislav Petkov <bp@...en8.de>
cc:     X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 13/13] x86/microcode/AMD: Remove AP scanning
 optimization

On Tue, 17 Jan 2017, Borislav Petkov wrote:

> From: Borislav Petkov <bp@...e.de>
> 
> The idea was to not scan the microcode blob on each AP (Application
> Processor) during boot and thus save us some milliseconds. However, on
> architectures where the microcode engine is shared between threads, this
> doesn't work. Here's why:
> 
> The microcode on CPU0, i.e., the first thread, gets updated. The second
> thread, i.e., CPU1, i.e., the first AP walks into load_ucode_amd_ap(),
> sees that there's no container cached and goes and scans for the proper
> blob.
> 
> It finds it and as a last step of apply_microcode_early_amd(), it tries
> to apply the patch but that core has already the updated microcode
> revision which it has received through CPU0's update. So it returns
> false and we do desc->size = -1 to prevent other APs from scanning.
>
> However, the next AP, CPU2, has a different microcode engine which
> hasn't been updated yet. The desc->size == -1 test prevents it from
> scanning the blob anew and we fail to update it.

Well, that could be solved by a proper state member in the global container
descriptor. But your solution is better in the end.

> The fix is much more straight-forward than it looks: the BSP
> (BootStrapping Processor), i.e., CPU0, caches the microcode patch
> in amd_ucode_patch. We use that on the AP and try to apply it.
> In the 99.9999% of cases where we have homogeneous cores - *not*
> mixed-steppings - the application will be successful and we're good to
> go.
> 
> In the remaining small set of systems, we will simply rescan the blob
> and find (or not, if none present) the proper patch and apply it then.

Makes sense, but how does such a system handle the suspend/resume case when
the micro code is in the initrd? Are you caching the per cpu patches
somewhere?

Reviewed-by: Thomas Gleixner <tglx@...utronix.de>