linux-kernel - Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E8C1A74.5090601@linux.vnet.ibm.com>
Date:	Wed, 05 Oct 2011 14:21:00 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Borislav Petkov <bp@...en8.de>, Tejun Heo <tj@...nel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	"tigran@...azian.fsnet.co.uk" <tigran@...azian.fsnet.co.uk>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...e.hu" <mingo@...e.hu>, "hpa@...or.com" <hpa@...or.com>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linux PM mailing list <linux-pm@...ts.linux-foundation.org>
Subject: Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix task
 freezing failures

On 10/05/2011 12:51 PM, Borislav Petkov wrote:
> On Tue, Oct 04, 2011 at 04:57:10PM -0400, Srivatsa S. Bhat wrote:
>> 1. Since we never invalidate the microcode once we get it from userspace, it
>>    also means that we will never be able to update the microcode for that cpu
>>    ever again! (since we will continue to reuse the same old microcode over and
>>    over again on every cpu online operation for that cpu).
>>    This restriction introduced by my patch seems bad, isn't it?
> 
> Well, if you have a new microcode image, you are supposed to place it
> under /lib/firmware/.. or where the kernel has been configured to find
> it and then reload the microcode module.
>
Oh well, then we can update the microcode after all...
 
>> 2. Suppose we have a 16 cpu machine and we boot it with only 8 cpus (ie., we online
>>    only 8 of the 16 cpus while booting). So it means that the kernel gets a copy
>>    of the microcode for each of these 8 cpus, but not for the ones that were not
>>    onlined while booting.
>>    [Let us assume that cpu number 10 was one among the 8 cpus that were not onlined
>>     while booting].
>>
>>    Later on, let's say we start our cpu hotplug + suspend/resume tests simultaneously.
>>    Now consider this possible scenario:
>>    
>>    * Userspace is not frozen
>>    * We initiate a cpu online operation on cpu 10. At the same time, since suspend
>>      is in progress, lets say the freezing begins.
>>    * Just before cpu 10 could be brought up online, userspace gets frozen.
>>    * Now while bringing up cpu 10, due to the CPU_ONLINE_FROZEN notification, the
>>      microcode core tries to apply the microcode to the cpu. But unfortunately, it
>>      doesn't have the microcode! (because this cpu is coming up for the first time
>>      and hence we never got its microcode from userspace...)
>>
>>      Now, again the same problem ensues: microcode core calls request_firmware and
>>      depends on the (frozen) userspace to get the microcode.
> 
> Ok, but is this a real-life scenario you expect to happen somewhere or
> is it something that happens only during test? IOW, if you have root
> there are many ways to shoot yourself in the foot, right?
> 

Well, honestly I was just trying to see in which all scenarios the patch
would probably not work well... In real-life I don't expect to hit such
a corner case!

> [..]
> 
>> I am still wondering if the approach I proposed earlier (the one in
>> which we defer applying microcode and queue up a callback function
>> etc) could solve all these issues. I am also playing around with the
>> idea of coupling that with mutual exclusion between cpu hotplug and
>> freezer to handle any problematic scenarios.
> 
> Well, all those solutions seem like they're not worth the trouble and
> complexity if those cases are only conjecture - if you still trigger
> them during your testing then probably mutually excluding freezer and
> CPU hotplug is something I would lean towards but I could be wrong.
>

Even I felt the same (moreover, that complex solution was not foolproof
either!). Please see my other mail which talks about how just mutually
excluding freezer and cpu hotplugging would solve everything.
 
> There's of course a much better fix which has been on the table for a
> while now involving loading the ucode from the bootloader and applying
> it much earlier than what we have now and keeping the ucode image in
> memory. This would solve the CPU hotplug deal completely. Maybe it's
> time I looked into it :-).
> 

Assuming I understood this correctly, I can see some issues in this
approach as well (since it is quite similar to the approach used in my
one-line patch), but yeah, definitely they are all very much corner
cases...

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@...ux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/