lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <54308373-7867-4b76-be34-63730953f83c@intel.com>
Date: Wed, 2 Apr 2025 10:14:09 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Colin Mitchell <colinmitchell@...gle.com>
Cc: bp@...en8.de, chang.seok.bae@...el.com, dave.hansen@...ux.intel.com,
 linux-kernel@...r.kernel.org, mingo@...hat.com, tglx@...utronix.de,
 x86@...nel.org
Subject: Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature

On 3/26/25 14:29, Colin Mitchell wrote:
>> On 2/28/25 15:23, Dave Hansen wrote:
>> You seem to be saying that you'd rather be (for instance) insecure
>> running old microcode than have the latency blip from a legacy microcode
>> load.
>> What action would you take if a staging-load fails? Retry again a few
>> times? Go back to the CPU vendor and get a new image? Or just ignore it?
> That's correct, but the latency tradeoff scales with the platform specific
> size of the microcode patch. I'd prefer to have a more deterministic
> update path and believe the potential latency blip would be significant
> enough to justify the option.
> 
> Adding configuration would allow me to handle the error as needed.
> A retry loop would be a first step but I could also look to migrate VMs
> off the machine if the platform specific latency blip would negatively 
> affect sensitive guest VMs. While an ideal solution imo would then
> allow me to force legacy loading, I could also settle with it being done
> through a reboot where early boot would already skip staging.

There's a lot to unpack there.

But, for the purposes of this series, I think what's here is fine for
now. Let's leave staging _purely_ as an opportunistic optimization.

If folks want to make this more configurable like making staging
*mandatory* and disabling legacy loading then we'll look at the patches
(and their justifications) as folks submit them. A good justification
would be something along these lines:

	Legacy microcode loading causes a 5,000ms latency blip. Our
	customers have been complaining to us for years about those
	legacy loading blips. Migrating a VM causes a 1ms latency blip.
	Those 4,999ms mean a lot to the folks running those VMs. As a
	CSP, we would like the flexibility to avoid the gigantic legacy
	microcode loading blips because they are bad and getting worse.

It becomes less compelling if it's something like this:

	Legacy microcode loading causes a 50ms latency blip. Migrating a
	VM causes a 49ms latency blip. That millisecond is super
	important.

... and increasingly less so as it becomes:

	We like knobs and flexibility for $REASONS.

You don't have to have down-to-the-millisecond numbers here. Orders of
magnitude are fine. But if you can't demonstrate (or don't anticipate)
orders of magnitude improvement from the knob, then it's probably not
worth it. It better be a 10x improvement, not 10%.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ