linux-kernel - Re: [RFC PATCH v5 092/104] KVM: TDX: Handle TDX PV HLT hypercall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e0280ab-c7aa-5d01-a36f-93d0d0d79e25@redhat.com>
Date:   Thu, 7 Apr 2022 17:56:05 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     isaku.yamahata@...el.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, isaku.yamahata@...il.com,
        Jim Mattson <jmattson@...gle.com>, erdemaktas@...gle.com,
        Connor Kuehl <ckuehl@...hat.com>
Subject: Re: [RFC PATCH v5 092/104] KVM: TDX: Handle TDX PV HLT hypercall

On 4/7/22 17:02, Sean Christopherson wrote:
> On Thu, Apr 07, 2022, Paolo Bonzini wrote:
>> On 3/4/22 20:49, isaku.yamahata@...el.com wrote:
>>> +	bool interrupt_disabled = tdvmcall_p1_read(vcpu);
>>
>> Where is R12 documented for TDG.VP.VMCALL<Instruction.HLT>?
>>
>>> +		 * Virtual interrupt can arrive after TDG.VM.VMCALL<HLT> during
>>> +		 * the TDX module executing.  On the other hand, KVM doesn't
>>> +		 * know if vcpu was executing in the guest TD or the TDX module.
>>
>> I don't understand this; why isn't it enough to check PI.ON or something
>> like that as part of HLT emulation?
> 
> Ooh, I think I remember what this is.  This is for the case where the virtual
> interrupt is recognized, i.e. set in vmcs.RVI, between the STI and "HLT".  KVM
> doesn't have access to RVI and the interrupt is no longer in the PID (because it
> was "recognized".  It doesn't get delivered in the guest because the TDCALL
> completes before interrupts are enabled.
> 
> I lobbied to get this fixed in the TDX module by immediately resuming the guest
> in this case, but obviously that was unsuccessful.

So the TDX module sets RVI while in an STI interrupt shadow.  So far so 
good.  Then:

- it receives the HLT TDCALL from the guest.  The interrupt shadow at 
this point is gone.

- it knows that there is an interrupt that can be delivered (RVI > PPR 
&& EFLAGS.IF=1, the other conditions of 29.2.2 don't matter)

- it forwards the HLT TDCALL nevertheless, to a clueless hypervisor that 
has no way to glean either RVI or PPR?

It's absurd that this be treated as anything but a bug.


Until that is fixed, KVM needs to do something like:

- every time a bit is set in PID.PIR, set tdx->buggy_hlt_workaround = 1

- every time TDG.VP.VMCALL<HLT> is received, 
xchg(&tdx->buggy_hlt_workaround, 0) and return immediately to the guest 
if it is 1.

Basically an internal version of PID.ON.

>>> +		details.full = td_state_non_arch_read64(
>>> +			to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH);
>>
>> TDX documentation says "the meaning of the field may change with Intel TDX
>> module version", where is this field documented?  I cannot find any "other
>> guest state" fields in the TDX documentation.
> 
> IMO we should put a stake in the ground and refuse to accept code that consumes
> "non-architectural" state.  It's all software, having non-architectural APIs is
> completely ridiculous.

Having them is fine, *using* them to work around undocumented bugs is 
the ridiculous part.

You didn't answer the other question, which is "Where is R12 documented 
for TDG.VP.VMCALL<Instruction.HLT>?" though...  Should I be worried? :)


Paolo