[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YvkJjsdlDIcerqLg@araj-dh-work>
Date: Sun, 14 Aug 2022 14:41:18 +0000
From: Ashok Raj <ashok.raj@...el.com>
To: Andrew Cooper <Andrew.Cooper3@...rix.com>
CC: Andy Lutomirski <luto@...nel.org>, Borislav Petkov <bp@...en8.de>,
"Thomas Gleixner" <tglx@...utronix.de>,
Tony Luck <tony.luck@...el.com>,
Dave Hansen <dave.hansen@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
"luto@...capital.net" <luto@...capital.net>,
Tom Lendacky <thomas.lendacky@....com>,
Ashok Raj <ashok.raj@...el.com>
Subject: Re: [PATCH 5/5] x86/microcode: Handle NMI's during microcode update.
Hi Andrew,
On Sun, Aug 14, 2022 at 11:58:17AM +0000, Andrew Cooper wrote:
> >> If I were implementing this, I would rendezvous via stop_machine as usual. Then I would set a flag or install a handler indicating that we are doing a microcode update, send NMI-to-self, and rendezvous in the NMI handler and do the update.
> > Well, that is exactly what I did for the first attempt. The code looked so
> > beautiful in the eyes of the creator :-) but somehow I couldn't get it to
> > not lock up.
>
> So the way we do this in Xen is to rendezvous in stop machine, then have
> only the siblings self-NMI. The primary threads don't need to be in NMI
> context, because the WRMSR to trigger the update *is* atomic with NMIs.
>
> However, you do need to make sure that the NMI wait loop knows not to
> wait for primary threads, otherwise you can deadlock when taking an NMI
> on a primary thread between setting up the NMI handler and actually
> issuing the update.
>
I'm almost sure that was the deadlock I ran into. You are correct, the
primary thread doesn't need to be in NMI, since once the wrmsr starts, it
can't be interrupted.
But the primary needs to wait until its own siblings have dropped into NMI.
Before proceeding to perform wrmsr.
in stop_machine() handler, primary thread waits for its thread siblings to
enter NMI and report itself. Siblings will simply self IPI and then proceed
to wait for exit_sync
then primary does the wrmsr flow
clears the wait_cpus mask so that secondary inside NMI hander can release
itself
resync at exit rendezvous.
I have this coded, will test and repost.
Cheers,
Ashok
Powered by blists - more mailing lists