lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YvzXsf0mGEcOlZC5@araj-dh-work>
Date:   Wed, 17 Aug 2022 11:57:37 +0000
From:   Ashok Raj <ashok.raj@...el.com>
To:     Borislav Petkov <bp@...en8.de>
CC:     Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Tony Luck" <tony.luck@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        "LKML Mailing List" <linux-kernel@...r.kernel.org>,
        X86-kernel <x86@...nel.org>,
        "Andy Lutomirski" <luto@...capital.net>,
        Tom Lendacky <thomas.lendacky@....com>,
        Jacon Jun Pan <jacob.jun.pan@...el.com>,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during
 microcode update

On Wed, Aug 17, 2022 at 10:09:00AM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 09:58:03AM +0200, Ingo Molnar wrote:
> > Also, Boris tells me that writing 0x0 to MSR_IA32_MCG_STATUS
> > apparently shuts the platform down - which is not ideal...
> 
> Right, if you get an MCE raised while MCIP=0, the machine shuts down.
> 
> And frankly, I can't think of a good solution to this whole issue:
> 
> - with current hw, if you get an MCE and MCIP=0 -> shutdown

You have this reversed. if you get an MCE and MCIP=1 -> shutdown

I'm still very reluctant, this is actually an overkill. I added what is
possible based on Boris's recommendation.

When MCE's happen during the update they are always fatal errors. But
atleast you can log them, even if some other weird error were to be
observed because they stomed over the patch area that primary is currently
working on. 

What we do here by setting MCIP=1, we promote to a more severe shutdown.

Ideally I would rather let the fallout happen since its observable vs a
blind shutdown is what we are promoting to.

> 
> - in the future, even if you change the hardware to block MCEs from
> being detected while the microcode update runs, what happens if a CPU
> encounters a hw error during that update?

I don't think there ever will be blocking MCE's :-)

If an error happens, it leads to shutdown.
> 
> You raise it immediately after? What if there are multiple MCEs? Not
> unheard of on a big machine...

Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.

Exception is the Local MCE which is recoverable, but only to user space.

If you get an error in the atomic we are polling, its a fatal error since
SW can't recover and we shutdown.
> 
> Worse, what happens if there's a bitflip in the memory where the
> to-be-updated microcode patch is?
> 
> You report the error afterwards?
> 
> Just thinking about this makes me real nervous.

Overthinking :-).. If there is concensus, if Boris feels comfortable
enough, i would drop this patch.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ