linux-kernel - Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y9mDYMASXCFaFkNU@zn.tnic>
Date:   Tue, 31 Jan 2023 22:08:48 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     "Raj, Ashok" <ashok.raj@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Schofield, Alison" <alison.schofield@...el.com>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Stefan Talpalaru <stefantalpalaru@...oo.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Jonathan Corbet <corbet@....net>,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Peter Zilstra <peterz@...radead.org>,
        "Lutomirski, Andy" <luto@...nel.org>,
        "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
        "Ostrovsky, Boris" <boris.ostrovsky@...cle.com>,
        Martin Pohlack <mpohlack@...zon.de>
Subject: Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu_info()
 to reflect current microcode

On Tue, Jan 31, 2023 at 08:49:52PM +0000, Luck, Tony wrote:
> What happens here if the update on the first hyperthread failed (sure, it shouldn't,
> but stuff happens at large scale).  In this case the current rev is still older that the
> the cache version ... so there is no "goto out", and this hyperthread will now write
> the MSR to initiate microcode update here, while the first thread is off executing
> arbitrary code (the situation that we want to avoid).

Lemme see if I can follow: we sync all threads in __reload_late() and
once they all arrive, we send them down into ->apply_microcode.

T0 arrives, and fails the update. That is this piece:

        /* write microcode via MSR 0x79 */
        wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);

        rev = intel_get_microcode_revision();

        if (rev != mc->hdr.rev) {
                pr_err("CPU%d update to revision 0x%x failed\n",
                       cpu, mc->hdr.rev);
                return UCODE_ERROR;
        }

We return here without updating cpu_sig.rev, as we should.

T1 arrives, updates successfully and updates its cpu_sig.rev.

T0's patch level has been updated too with that because the microcode
engine is shared between the threads. T0's cpu_sig.rev isn't, however,
as that has happened "behind its back", so to speak.

Is that the scenario you're talking about?

If so, if you look at __reload_late(), it'll say

	pr_warn("Error reloading microcode on CPU %d\n", cpu);

and the large scale operator will know.

And well, the easy fix is, do the reload again. :-)

That'll update the cached values too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette