lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZgVywaHkKVNNfuQ8@chao-email>
Date: Thu, 28 Mar 2024 21:38:09 +0800
From: Chao Gao <chao.gao@...el.com>
To: Xiaoyao Li <xiaoyao.li@...el.com>
CC: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "Zhang, Tina" <tina.zhang@...el.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
	"Chen, Bo2" <chen.bo@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Aktas, Erdem"
	<erdemaktas@...gle.com>, "isaku.yamahata@...ux.intel.com"
	<isaku.yamahata@...ux.intel.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "Yuan, Hang" <hang.yuan@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages
 for unsupported cases

On Thu, Mar 28, 2024 at 09:21:37PM +0800, Xiaoyao Li wrote:
>On 3/28/2024 6:17 PM, Chao Gao wrote:
>> On Thu, Mar 28, 2024 at 11:40:27AM +0800, Xiaoyao Li wrote:
>> > On 3/28/2024 11:04 AM, Edgecombe, Rick P wrote:
>> > > On Thu, 2024-03-28 at 09:30 +0800, Xiaoyao Li wrote:
>> > > > > The current ABI of KVM_EXIT_X86_RDMSR when TDs are created is nothing. So I don't see how this
>> > > > > is
>> > > > > any kind of ABI break. If you agree we shouldn't try to support MTRRs, do you have a different
>> > > > > exit
>> > > > > reason or behavior in mind?
>> > > > 
>> > > > Just return error on TDVMCALL of RDMSR/WRMSR on TD's access of MTRR MSRs.
>> > > 
>> > > MTRR appears to be configured to be type "Fixed" in the TDX module. So the guest could expect to be
>> > > able to use it and be surprised by a #GP.
>> > > 
>> > >           {
>> > >             "MSB": "12",
>> > >             "LSB": "12",
>> > >             "Field Size": "1",
>> > >             "Field Name": "MTRR",
>> > >             "Configuration Details": null,
>> > >             "Bit or Field Virtualization Type": "Fixed",
>> > >             "Virtualization Details": "0x1"
>> > >           },
>> > > 
>> > > If KVM does not support MTRRs in TDX, then it has to return the error somewhere or pretend to
>> > > support it (do nothing but not return an error). Returning an error to the guest would be making up
>> > > arch behavior, and to a lesser degree so would ignoring the WRMSR.
>> > 
>> > The root cause is that it's a bad design of TDX to make MTRR fixed1. When
>> > guest reads MTRR CPUID as 1 while getting #VE on MTRR MSRs, it already breaks
>> > the architectural behavior. (MAC faces the similar issue , MCA is fixed1 as
>> 
>> I won't say #VE on MTRR MSRs breaks anything. Writes to other MSRs (e.g.
>> TSC_DEADLINE MSR) also lead to #VE. If KVM can emulate the MSR accesses, #VE
>> should be fine.
>> 
>> The problem is: MTRR CPUID feature is fixed 1 while KVM/QEMU doesn't know how
>> to virtualize MTRR especially given that KVM cannot control the memory type in
>> secure-EPT entries.
>
>yes, I partly agree on that "#VE on MTRR MSRs breaks anything". #VE is not a
>problem, the problem is if the #VE is opt-in or unconditional.

>From guest's p.o.v, there is no difference: the guest doesn't know whether a feature
is opted in or not.

>
>For the TSC_DEADLINE_MSR, #VE is opt-in actually.
>CPUID(1).EXC[24].TSC_DEADLINE is configurable by VMM. Only when VMM
>configures the bit to 1, will the TD guest get #VE. If VMM configures it to
>0, TD guest just gets #GP. This is the reasonable design.
>
>> > well while accessing MCA related MSRs gets #VE. This is why TDX is going to
>> > fix them by introducing new feature and make them configurable)
>> > 
>> > > So that is why I lean towards
>> > > returning to userspace and giving the VMM the option to ignore it, return an error to the guest or
>> > > show an error to the user.
>> > 
>> > "show an error to the user" doesn't help at all. Because user cannot fix it,
>> > nor does QEMU.
>> 
>> The key point isn't who can fix/emulate MTRR MSRs. It is just KVM doesn't know
>> how to handle this situation and ask userspace for help.
>> 
>> Whether or how userspace can handle the MSR writes isn't KVM's problem. It may be
>> better if KVM can tell userspace exactly in which cases KVM will exit to
>> userspace. But there is no such an infrastructure.
>> 
>> An example is: in KVM CET series, we find it is complex for KVM instruction
>> emulator to emulate control flow instructions when CET is enabled. The
>> suggestion is also to punt to userspace (w/o any indication to userspace that
>> KVM would do this).
>
>Please point me to decision of CET? I'm interested in how userspace can help
>on that.

https://lore.kernel.org/kvm/ZZgsipXoXTKyvCZT@google.com/

>
>> > 
>> > > If KVM can't support the behavior, better to get an actual error in
>> > > userspace than a mysterious guest hang, right?
>> > What behavior do you mean?
>> > 
>> > > Outside of what kind of exit it is, do you object to the general plan to punt to userspace?
>> > > 
>> > > Since this is a TDX specific limitation, I guess there is KVM_EXIT_TDX_VMCALL as a general category
>> > > of TDVMCALLs that cannot be handled by KVM.
>> 
>> Using KVM_EXIT_TDX_VMCALL looks fine.
>> 
>> We need to explain why MTRR MSRs are handled in this way unlike other MSRs.
>> 
>> It is better if KVM can tell userspace that MTRR virtualization isn't supported
>> by KVM for TDs. Then userspace should resolve the conflict between KVM and TDX
>> module on MTRR. But to report MTRR as unsupported, we need to make
>> GET_SUPPORTED_CPUID a vm-scope ioctl. I am not sure if it is worth the effort.
>
>My memory is that Sean dislike the vm-scope GET_SUPPORTED_CPUID for TDX when
>he was at Intel.

Ok. No strong opinion on this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ