[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e7413b5-482a-4243-be6c-21a0ee232cc4@intel.com>
Date: Tue, 21 May 2024 12:42:49 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when
proving #VE support
On 21/05/2024 12:21 pm, Sean Christopherson wrote:
> On Tue, May 21, 2024, Kai Huang wrote:
>> On 21/05/2024 11:22 am, Sean Christopherson wrote:
>>> On Tue, May 21, 2024, Kai Huang wrote:
>>>> On 18/05/2024 12:04 pm, Sean Christopherson wrote:
>>>>> Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
>>>>> initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
>>>>> enabled and a VE info address pointing at pfn 0.
>>>>
>>>> How about we just clear EPT_VIOLATION_VE bit in 2nd_exec_control
>>>> unconditionally for vmcs02?
>>>
>>> Because then KVM wouldn't get any EPT Violation #VE coverage for L2, and as
>>> evidence by the KVM-Unit-Test failure, running L2 with EPT Violation #VEs enabled
>>> provides unique coverage. Doing so definitely provides coverage beyond what is
>>> strictly needed for TDX, but it's just as easy to set the VE info page in vmcs02
>>> as it is so clear EPT_VIOLATION_VE, so why not.
>>>
>>>> Your next patch says:
>>>>
>>>> "
>>>> Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
>>>> as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
>>>> are KVM's responsibility.
>>>> "
>>>
>>> I don't see how that's relevant to whether or not KVM enables EPT Violation #VEs
>>> while L2 is running. That patch simply routes all #VEs to L0, it doesn't affect
>>> whether or not it's safe to enable EPT Violation #VEs for L2.
>>
>> My logic is, if #VE exit cannot possibly happen for L2, then we don't need
>> to deal whether to route #VE exits to L1. :-)
>>
>> Well, actually I think conceptually, it kinda makes sense to route #VE exits
>> to L1:
>>
>> L1 should never enable #VE related bits so L1 is certainly not expecting to
>
> Not "should never", "can never". If L1 attempts to enable EPT_VIOLATION_VE, then
> VM-Enter will VM-Fail.
>
>> see #VE from L2. But how to act should be depending on L1's logic? E.g., it
>> can choose to ignore, or just kill the L2 etc?
>
> No. Architecturally, from L1's perspective, a #VE VM-Exit _cannot_ occur in L2.
> L1 can inject a #VE into L2, but a #VE cannot be generated by the CPU and thus
> cannot cause a VM-Exit.
OK. The point is not to argue about L1 how to handle, but whether we
should inject to L1 -- L1 can do whatever it believes legal/sane.
But I understand the purpose is to test/validate, so it's fine for L0 to
handle, and by handle it eventually means we want to just dump that #VE
exit.
But now L0 always handles #VE exits from L2, and AFAICT L0 will just
kill the L1, until the patch:
KVM: VMX: Don't kill the VM on an unexpected #VE
lands.
So looks that patch at least should be done first. Otherwise it doesn't
make a lot sense to kill L1 for #VE exits from L2.
>
>> Unconditionally disable #VE in vmcs02 can avoid such issue because it's just
>> not possible for L2 to have the #VE exit.
>
> Sure, but by that argument we could just avoid all nested VMX issues by never
> enabling anything for L2.
>
> If there's an argument to be made for disabling EPT_VIOLATION_VE in vmcs02, it's
> that the potential maintenance cost of keeping nEPT, nVMX, and the shadow MMU
> healthy outweighs the benefits. I.e. we don't have a use case for enabling
> EPT_VIOLATION_VE while L2 is running, so why validate it?
Yeah. I am not sure the purpose of validating #VE exits from L2.
>
> If whatever bug the KUT EPT found ends up being a KVM bug that specifically only
> affects nVMX, then it'd be worth revisiting whether or not it's worth enabling
> EPT_VIOLATION_VE in vmcs02. But that's a rather big "if" at this point.
OK.
Powered by blists - more mailing lists