linux-kernel - Re: [PATCH] x86: Intel microcode loader performance improvement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B955FF6.5060300@tmr.com>
Date:	Mon, 08 Mar 2010 15:37:10 -0500
From:	Bill Davidsen <davidsen@....com>
To:	Dmitry Adamushko <dmitry.adamushko@...il.com>
CC:	Dimitri Sivanich <sivanich@....com>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH] x86: Intel microcode loader performance improvement

Dmitry Adamushko wrote:
> On 5 March 2010 18:42, Dimitri Sivanich <sivanich@....com> wrote:
>> We've noticed that on large SGI UV system configurations, running
>> microcode.ctl can take very long periods of time.  This is due to
>> the large number of vmalloc/vfree calls made by the Intel
>> generic_load_microcode() logic.
>>
>> By reusing allocated space, the following patch reduces the time
>> to run microcode.ctl on a 1024 cpu system from approximately 80
>> seconds down to 1 or 2 seconds.
>>
>> Signed-off-by: Dimitri Sivanich <sivanich@....com>
> 
> This approach seems reasonable in the scope of the current framework.
> 
> Acked-by: Dmitry Adamushko <dmitry.adamushko@...il.com>
> 
> However, I think a better approach would be to have some kind of
> shared storage for loaded microcode updates. Given that for the
> majority of SMP systems all the cpus are normally updated to the very
> same new instance of microcode, it should be enough to do a search for
> the first cpu, cache the instance of microcode and then reuse it for
> others.
> 
The assumption that all CPUs are the same is not always true in practice, people 
buy a system and don't always fully populate initially, and when they add 
processors, they have a more recent stepping. So reusing microcode or updating 
in parallel would add complexity, and 2 sec for 1024 CPUs puts a pretty low 
upper bound on possible improvement. Does more improvement to a one time small 
delay justify additional complexity?

Systems that size are probably not booted all that often. Something to consider 
before putting a lot of effort into it, I think.

-- 
Bill Davidsen <davidsen@....com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/