linux-kernel - Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 5 Oct 2011 22:26:58 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	Borislav Petkov <bp@...64.org>, Borislav Petkov <bp@...en8.de>,
	Tejun Heo <tj@...nel.org>,
	"tigran@...azian.fsnet.co.uk" <tigran@...azian.fsnet.co.uk>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...e.hu" <mingo@...e.hu>, "hpa@...or.com" <hpa@...or.com>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linux PM mailing list <linux-pm@...ts.linux-foundation.org>
Subject: Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures

On Wednesday, October 05, 2011, Srivatsa S. Bhat wrote:
> On 10/05/2011 12:51 PM, Borislav Petkov wrote:
> > On Tue, Oct 04, 2011 at 04:57:10PM -0400, Srivatsa S. Bhat wrote:
> >> 1. Since we never invalidate the microcode once we get it from userspace, it
> >>    also means that we will never be able to update the microcode for that cpu
> >>    ever again! (since we will continue to reuse the same old microcode over and
> >>    over again on every cpu online operation for that cpu).
> >>    This restriction introduced by my patch seems bad, isn't it?
> > 
> > Well, if you have a new microcode image, you are supposed to place it
> > under /lib/firmware/.. or where the kernel has been configured to find
> > it and then reload the microcode module.
> >
> Oh well, then we can update the microcode after all...
>  
> >> 2. Suppose we have a 16 cpu machine and we boot it with only 8 cpus (ie., we online
> >>    only 8 of the 16 cpus while booting). So it means that the kernel gets a copy
> >>    of the microcode for each of these 8 cpus, but not for the ones that were not
> >>    onlined while booting.
> >>    [Let us assume that cpu number 10 was one among the 8 cpus that were not onlined
> >>     while booting].
> >>
> >>    Later on, let's say we start our cpu hotplug + suspend/resume tests simultaneously.
> >>    Now consider this possible scenario:
> >>    
> >>    * Userspace is not frozen
> >>    * We initiate a cpu online operation on cpu 10. At the same time, since suspend
> >>      is in progress, lets say the freezing begins.
> >>    * Just before cpu 10 could be brought up online, userspace gets frozen.
> >>    * Now while bringing up cpu 10, due to the CPU_ONLINE_FROZEN notification, the
> >>      microcode core tries to apply the microcode to the cpu. But unfortunately, it
> >>      doesn't have the microcode! (because this cpu is coming up for the first time
> >>      and hence we never got its microcode from userspace...)
> >>
> >>      Now, again the same problem ensues: microcode core calls request_firmware and
> >>      depends on the (frozen) userspace to get the microcode.
> > 
> > Ok, but is this a real-life scenario you expect to happen somewhere or
> > is it something that happens only during test? IOW, if you have root
> > there are many ways to shoot yourself in the foot, right?
> > 
> 
> Well, honestly I was just trying to see in which all scenarios the patch
> would probably not work well... In real-life I don't expect to hit such
> a corner case!
> 
> > [..]
> > 
> >> I am still wondering if the approach I proposed earlier (the one in
> >> which we defer applying microcode and queue up a callback function
> >> etc) could solve all these issues. I am also playing around with the
> >> idea of coupling that with mutual exclusion between cpu hotplug and
> >> freezer to handle any problematic scenarios.
> > 
> > Well, all those solutions seem like they're not worth the trouble and
> > complexity if those cases are only conjecture - if you still trigger
> > them during your testing then probably mutually excluding freezer and
> > CPU hotplug is something I would lean towards but I could be wrong.
> >
> 
> Even I felt the same (moreover, that complex solution was not foolproof
> either!). Please see my other mail which talks about how just mutually
> excluding freezer and cpu hotplugging would solve everything.
>  
> > There's of course a much better fix which has been on the table for a
> > while now involving loading the ucode from the bootloader and applying
> > it much earlier than what we have now and keeping the ucode image in
> > memory. This would solve the CPU hotplug deal completely. Maybe it's
> > time I looked into it :-).
> > 
> 
> Assuming I understood this correctly, I can see some issues in this
> approach as well (since it is quite similar to the approach used in my
> one-line patch), but yeah, definitely they are all very much corner
> cases...

OK, can you please repost the patch with Borislav's Acked-by and Tested-by
and add some more Intel people to the CC list?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/