lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f59e7fe1-b0e1-4f0d-b4da-cf485d8f9914@intel.com>
Date: Wed, 20 Aug 2025 09:31:01 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Sean Christopherson <seanjc@...gle.com>,
 Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: Adrian Hunter <adrian.hunter@...el.com>, Chao Gao <chao.gao@...el.com>,
 Len Brown <len.brown@...el.com>, Kai Huang <kai.huang@...el.com>,
 "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
 Reinette Chatre <reinette.chatre@...el.com>,
 "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
 "tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "pbonzini@...hat.com" <pbonzini@...hat.com>,
 Isaku Yamahata <isaku.yamahata@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 Yan Y Zhao <yan.y.zhao@...el.com>, Ira Weiny <ira.weiny@...el.com>
Subject: Re: [PATCH RFC 1/2] KVM: TDX: Disable general support for MWAIT in
 guest

On 8/20/2025 7:35 AM, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Rick P Edgecombe wrote:
>> On Tue, 2025-08-19 at 10:38 +0300, Adrian Hunter wrote:
>>> On 18/08/2025 21:49, Edgecombe, Rick P wrote:
>>>> Attn: Binbin, Xiaoyao
>>>>
>>>> On Mon, 2025-08-18 at 07:05 -0700, Sean Christopherson wrote:
>>>>> NAK.
>>>>>
>>>>> Fix the guest, or wherever else in the pile there are issues.  KVM is NOT carrying
>>>>> hack-a-fixes to workaround buggy software/firmware.  Been there, done that.
>>>>
>>>> Yes, I would have thought we should have at least had a TDX module change option
>>>> for this.
>>>
>>> That would not help with existing TDX Modules, and would possibly require
>>> a guest opt-in, which would not help with existing guests.  Hence, to start
>>> with disabling the feature first, and look for another solution second.
>>
>> I think you have the priorities wrong. There are only so many kludges we can ask
>> KVM to take. Across all the changes people want for TDX, do you think not having
>> to update the TDX module, backport a guest fix or even just adjust qemu args is
>> more important the other stuff?
> 
> I'm especially sensitive to fudging around _bugs_ in other pieces of the stack.
> KVM has been burned badly, multiple times, by hacking around issues elsewhere.
> There are inevitably cases where throwing something into KVM is the least awful
> choice (usually because it's the only feasible choise), but this ain't one of
> those cases.
> 
>> TDX support is still very early. We need to think about long term sustainable
>> solutions. So a fix that doesn't support existing TDX modules or guests (the
>> intel_idle fix is also in this category anyway) should absolutely be on the
>> table.
>>
>>>
>>> In the MWAIT case, Sean has rejected supporting MSR_PKG_CST_CONFIG_CONTROL
>>> even for VMX, because it is an optional MSR, so altering intel_idle is
>>> being proposed.
> 
> No, I rejected support MSR_PKG_CST_CONFIG_CONTROL _in KVM_ because I don't see
> any reason to shove information into KVM.  AFAICT, it's not an "architectural"
> MSR, and all of KVM's existing handling of truly uarch/model-specific MSRs is
> painful and ugly.

E.g., for MSR_IA32_POWER_CTL (I don't know why it has _IA32_ in the 
name, it seems to me not an architectural MSR), KVM chose to just 
emulate the read/write of it as NOP to workaround the issue that guest 
driver will access it after MWAIT is allowed natively for the guest.

TDX allowed the same workaround by returning true for MSR_IA32_POWER_CTL 
in tdx_has_emulated_msr(). So that when td guest requests emulation from 
KVM on #VE, the workaround can come into play. But with #VE reduction 
enabled by the TD guest, no #VE anymore but #GP.

It seems we cannot remove the workaround because that will break the 
existing guests who rely on the workaround.

> And userspace (e.g. QEMU) could support emulate MSR_PKG_CST_CONFIG_CONTROL (and
> any other MSRs of that nature) via MSR filters.  I doubt the MSR is accessed
> outside of boot paths, so the cost of a userspace exit should be a non-issue.
> Of course, QEMU probably can't provide useful/accurate information.
> 
> One option if there is a super strong need to do so would be to add a "disable
> exits" capability to let the guest read package c-state MSRs, but that has
> obvious downsides and would still just be fudging around a flawed driver.

I have thought about this option as well. Unfortunately there are WRITE 
case in the driver, and we cannot simply pass through WRITE.

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8dbf19aa66ef..c254aa26ff22 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4120,6 +4120,15 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>                  vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
>                  vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>          }
> +       if (kvm_package_cstate_in_guest(vcpu->kvm)) {
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_CST_CONFIG_CONTROL, MSR_TYPE_R);
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_C2_RESIDENCY, MSR_TYPE_R);
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_C3_RESIDENCY, MSR_TYPE_R);
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_C6_RESIDENCY, MSR_TYPE_R);
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_C8_RESIDENCY, MSR_TYPE_R);
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_C9_RESIDENCY, MSR_TYPE_R);
> +               vmx_disable_intercept_for_msr(vcpu, MSR_PKG_C10_RESIDENCY, MSR_TYPE_R);
> +       }
>          if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
>                  vmx_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R);
>                  vmx_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ