[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BA4DD4E6-D507-4BB9-8CC4-50043049DC86@nutanix.com>
Date: Wed, 2 Apr 2025 13:46:31 +0000
From: Jon Kohler <jon@...anix.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen
<dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>, "H. Peter
Anvin" <hpa@...or.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Emanuele
Giuseppe Esposito <eesposit@...hat.com>,
Pawan Gupta
<pawan.kumar.gupta@...ux.intel.com>
Subject: Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to
MDS
> On Apr 2, 2025, at 9:36 AM, Sean Christopherson <seanjc@...gle.com> wrote:
>
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On Mon, Mar 31, 2025, Jon Kohler wrote:
>> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases
>> to support live migration from older hardware (e.g., Cascade Lake, Ice
>> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures
>> compatibility when user space has previously configured vCPUs to see
>> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
>>
>> Newer hardware sets the following bits but does not set FB_CLEAR, which
>> can prevent user space from configuring a matching setup:
>
> I looked at this again right after PUCK, and KVM does NOT actually prevent
> userspace from matching the original, pre-SPR configuration. KVM effectively
> treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
> value. I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
> support, and thus there is no need for KVM to lie to userspace.
>
> So in effect, this is a userspace problem where it's being too aggressive in its
> sanity checks.
>
> FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
> say this is userspace's problem to solve. E.g. by using MSR filtering to
> intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.
Thanks, Sean, I appreciate it. I’ll see what sort of trouble I can get in on the user
space side of the house with qemu to see if there is a clean way to special case
this.
Cheers, Jon
>
>> ARCH_CAP_MDS_NO
>> ARCH_CAP_TAA_NO
>> ARCH_CAP_PSDP_NO
>> ARCH_CAP_FBSDP_NO
>> ARCH_CAP_SBDR_SSDP_NO
>>
>> This change has minimal impact, as these bit combinations already mark
>> the host as MMIO immune (via arch_cap_mmio_immune()) and set
>> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no
>> additional overhead.
>>
>> Cc: Emanuele Giuseppe Esposito <eesposit@...hat.com>
>> Cc: Paolo Bonzini <pbonzini@...hat.com>
>> Cc: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
>> Signed-off-by: Jon Kohler <jon@...anix.com>
>>
>> ---
>> arch/x86/kvm/x86.c | 14 ++++++++++++++
>> 1 file changed, 14 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c841817a914a..2a4337aa78cd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
>> if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>> data |= ARCH_CAP_GDS_NO;
>>
>> + /*
>> + * User space might set FB_CLEAR when starting a vCPU on a system
>> + * that does not enumerate FB_CLEAR but is also invulnerable to
>> + * other various MDS related bugs. To allow live migration from
>> + * hosts that do implement FB_CLEAR, leave it enabled.
>> + */
>> + if ((data & ARCH_CAP_MDS_NO) &&
>> + (data & ARCH_CAP_TAA_NO) &&
>> + (data & ARCH_CAP_PSDP_NO) &&
>> + (data & ARCH_CAP_FBSDP_NO) &&
>> + (data & ARCH_CAP_SBDR_SSDP_NO)) {
>> + data |= ARCH_CAP_FB_CLEAR;
>> + }
>> +
>> return data;
>> }
>>
>> --
>> 2.43.0
>>
Powered by blists - more mailing lists