linux-kernel - Re: [PATCH v2] x86/its: use Sapphire Rapids+ feature to opt out

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C2A57B61-8E6A-4236-9F50-B0662C39272D@nutanix.com>
Date: Fri, 17 Oct 2025 12:21:08 +0000
From: Jon Kohler <jon@...anix.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Jonathan Corbet
	<corbet@....net>, Ingo Molnar <mingo@...hat.com>,
        Dave Hansen
	<dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "H. Peter
 Anvin" <hpa@...or.com>,
        Brian Gerst <brgerst@...il.com>,
        Brendan Jackman
	<jackmanb@...gle.com>,
        "Ahmed S. Darwish" <darwi@...utronix.de>,
        Alexandre
 Chartre <alexandre.chartre@...cle.com>,
        "linux-doc@...r.kernel.org"
	<linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] x86/its: use Sapphire Rapids+ feature to opt out



> On Oct 17, 2025, at 12:12 AM, Dave Hansen <dave.hansen@...el.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 10/16/25 18:18, Jon Kohler wrote:
>> + * hardware, except in the situation where the guest is presented
>> + * with a feature that only exists in non-vulnerable hardware.
>> */
>> - if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
>> +    !boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
>> return true;
> 
> This seems like a hack in its purest form. Even worse, it's an
> _uncommented_ hack.

Thanks for the review and comments, Dave.

Yes, it is a hack, I could do a better job on this, I’ve proposed
another pass at the bottom. See below for more detail. I’m
hoping we can work on something better before we
completely put this out to pasture.

> This is _literally_ what ITS_NO is for.

Not quite, as ITS_NO is for the VMM to drive the opt_out workflow.
Same with BHI_CTRL; however, I’ll explain below why this is a problem
for distributions and guests.

> So it's a pretty strong NAK from me on this one. No thanks. If you think
> this is useful, it's a great thing to carry in a local kernel fork, but
> it has no place in mainline.

I understand why you’d NAK this revision of the patch, but I’d love
to have a slightly longer discussion on what we could do to solve
the problem driving this commit.

This isn’t for our products/kernels, but rather guest kernels
from distributions that run on our (or anyone else’s) virtualization
products. I’ll admit I could improve the commit message to reflect
the driver for this, that’s what I get for working late :) my apologies

Here’s the deal:
With ITS on SPR, we see up to a ~3x regression in SAP’s
PBOffline benchmark tool in a metric that they call ‘cputime’. From
the end-users perspective, this happens out of nowhere when they
update to the ITS-enabled version of SLES kernel.

In that benchmark, it tracks all sorts of stuff, including the cumulative
time spent of all calls in their ‘indexserver’ process. The idea being
that they want to track both database / app response time as well
as the associated cost on the system.

The problem is that a guest kernel can not control what the VMM
configuration is, which is what the original ITS commit points out,
and the end user will automatically see this regression when they
deploy/update their kernel on a VMM that may not have ITS_NO

I am going to send patches for QEMU to add ITS_NO today, but
that doesn’t help anyone in this situation, who will hit this regression
on hardware that Intel has documented as unimpacted.

Now, the counter for that is that we’re also looking at BHI_CTRL
in the kernel code, but as the commit msg noted, that didn’t appear
in QEMU at least until 9.2, which is still fairly recent code. Even
then, it would still have to be configured as part of the virt stack
and isn’t an “automatic” given just booting a SPR model VM on a
SPR++ host with the fixed up QEMU.

The entire point (at least that I can figure from the docs and original
commit) of having the default enablement is that in the migration
pool scenario that Intel has documented, where just looking at
eIBRS enablement wouldn’t be sufficient because it would be
possible a guest with *only* eIBRS, even when started on SPR,
to be configured in such a way where it didn’t have any SPR++
features, and then be migrated to an impacted (e.g. ICX) host
at a later point.

Distros can accomplish the exact same thing in the guest, without
VMM modifications by simply looking at something that is exclusive to
SPR++, and know that any sane VMM would not (or could not)
allow a guest with higher level features active to migrate to a lower
level host.

That all said, that is not what indirect-target-selection.rst says.
The docs says that the reason why this is on by default is:
	All guests deploy ITS mitigation by default, irrespective of
	eIBRS enumeration and Family/Model of the guest. This is
	because eIBRS feature could be hidden from a guest.

Using that documentation to improve my approach, how about
this instead, where A) we have better code comments and B) we
also check eIBRS enablement? 

static bool __init vulnerable_to_its(u64 x86_arch_cap_msr)
{
...
	/*
	 * Some hypervisors do not expose ITS_NO or BHI_CTRL to guests.
	 * We can nevertheless infer that the underlying CPU is unaffected
	 * by checking for other features that only exist on unaffected
	 * hardware and by requiring that eIBRS is presented to the guest.
	 * If these conditions are met, the hypervisor cannot migrate the
	 * guest to vulnerable hardware without changing the advertised
	 * feature set. Use bus lock detection (introduced on Sapphire
	 * Rapids) as such a proxy feature. This is an intentional
	 * workaround for non-upgraded hypervisors to avoid unnecessary
	 * performance regressions on systems that are not vulnerable.
	 */
	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
		x86_arch_cap_msr & ARCH_CAP_IBRS_ALL &&
		!boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
		return false;
	
	/*
	 * If a VMM did not expose ITS_NO and does not expose eIBRS or
	 * other immunity bits, assume that a guest could be running on
	 * a vulnerable hardware or may migrate to such hardware.
	 */
	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
		return true;
...
}