linux-kernel - Re: [PATCH v3 1/2] KVM: x86: Check hypercall's exit to userspace generically

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a441733d-1885-42f0-a046-2c8871c46d6b@linux.intel.com>
Date: Mon, 4 Nov 2024 17:03:34 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: "Huang, Kai" <kai.huang@...el.com>, "seanjc@...gle.com"
 <seanjc@...gle.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
 "Li, Xiaoyao" <xiaoyao.li@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "Yamahata, Isaku" <isaku.yamahata@...el.com>,
 "yuan.yao@...ux.intel.com" <yuan.yao@...ux.intel.com>
Subject: Re: [PATCH v3 1/2] KVM: x86: Check hypercall's exit to userspace
 generically




On 11/2/2024 5:13 AM, Huang, Kai wrote:
> On Fri, 2024-11-01 at 09:39 -0700, Sean Christopherson wrote:
>> On Fri, Nov 01, 2024, Kai Huang wrote:
>>> On Thu, 2024-10-31 at 07:54 -0700, Sean Christopherson wrote:
>>>> On Thu, Oct 31, 2024, Kai Huang wrote:
>>>> -	ret = __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit, cpl);
>>>> -	if (nr == KVM_HC_MAP_GPA_RANGE && !ret)
>>>> -		/* MAP_GPA tosses the request to the user space. */
>>>> -		return 0;
>>>> +	r = __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit, cpl, &ret);
>>>> +	if (r <= r)
>>>> +		return r;
>>> ... should be:
>>>
>>> 	if (r <= 0)
>>> 		return r;
>>>
>>> ?
>>>
>>> Another option might be we move "set hypercall return value" code inside
>>> __kvm_emulate_hypercall().  So IIUC the reason to split
>>> __kvm_emulate_hypercall() out is for TDX, and while non-TDX uses RAX to carry
>>> the hypercall return value, TDX uses R10.
>>>
>>> We can additionally pass a "kvm_hypercall_set_ret_func" function pointer to
>>> __kvm_emulate_hypercall(), and invoke it inside.  Then we can change
>>> __kvm_emulate_hypercall() to return:
>>>      < 0 error,
>>>      ==0 return to userspace,
>>>      > 0 go back to guest.
>> Hmm, and the caller can still handle kvm_skip_emulated_instruction(), because the
>> return value is KVM's normal pattern.
>>
>> I like it!
>>
>> But, there's no need to pass a function pointer, KVM can write (and read) arbitrary
>> GPRs, it's just avoided in most cases so that the sanity checks and available/dirty
>> updates are elided.  For this code though, it's easy enough to keep kvm_rxx_read()
>> for getting values, and eating the overhead of a single GPR write is a perfectly
>> fine tradeoff for eliminating the return multiplexing.
>>
>> Lightly tested.  Assuming this works for TDX and passes testing, I'll post a
>> mini-series next week.
>>
>> --
>> From: Sean Christopherson <seanjc@...gle.com>
>> Date: Fri, 1 Nov 2024 09:04:00 -0700
>> Subject: [PATCH] KVM: x86: Refactor __kvm_emulate_hypercall() to accept reg
>>   names, not values
>>
>> Rework __kvm_emulate_hypercall() to take the names of input and output
>> (guest return value) registers, as opposed to taking the input values and
>> returning the output value.  As part of the refactor, change the actual
>> return value from __kvm_emulate_hypercall() to be KVM's de facto standard
>> of '0' == exit to userspace, '1' == resume guest, and -errno == failure.
>>
>> Using the return value for KVM's control flow eliminates the multiplexed
>> return value, where '0' for KVM_HC_MAP_GPA_RANGE (and only that hypercall)
>> means "exit to userspace".
>>
>> Use the direct GPR accessors to read values to avoid the pointless marking
>> of the registers as available, but use kvm_register_write_raw() for the
>> guest return value so that the innermost helper doesn't need to multiplex
>> its return value.  Using the generic kvm_register_write_raw() adds very
>> minimal overhead, so as a one-off in a relatively slow path it's well
>> worth the code simplification.
> Ah right :-)
>
>> Suggested-by: Kai Huang <kai.huang@...el.com>
>> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>> ---
> I think Binbin can help to test on TDX, and assuming it works,
I tried to add a selftest case to do memory conversion via kvm hypercall
directly for TDX.  And found TDX code didn't handle the return value for
the hypercall properly.

I tried to add a parameter to pass the cui callback as mentioned in
https://lore.kernel.org/lkml/f95cd8c6-af5c-4d8f-99a8-16d0ec56d9a4@linux.intel.com/
And then, made the following change in TDX code to make it work.

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index cd27ebd3d7d1..efa434c6547d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1072,6 +1072,15 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
         return 0;
  }

+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
+{
+       u64 ret = vcpu->run->hypercall.ret;
+
+       kvm_r10_write(vcpu, ret);
+       ++vcpu->stat.hypercalls;
+
+       return 1;
+}
+
  static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
  {
         int r;
@@ -1087,7 +1096,7 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
          * R10: KVM hypercall number
          * arguments: R11, R12, R13, R14.
          */
-       r = __kvm_emulate_hypercall(vcpu, r10, r11, r12, r13, r14, true, 0, R10);
+       r = __kvm_emulate_hypercall(vcpu, r10, r11, r12, r13, r14, true, 0, R10, complete_hypercall_exit);

         return r > 0;
  }


>
> Reviewed-by: Kai Huang <kai.huang@...el.com>
>