[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af8bbddc-fcf5-460b-9a6f-1418a0748f37@intel.com>
Date: Thu, 15 Jan 2026 15:47:24 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Sagi Shahar <sagis@...gle.com>, Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, Kiryl Shutsemau <kas@...nel.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Thomas Gleixner <tglx@...nel.org>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev,
Vishal Annapurve <vannapurve@...gle.com>
Subject: Re: [PATCH] KVM: TDX: Allow userspace to return errors to guest for
MAPGPA
On 1/15/2026 9:21 AM, Sagi Shahar wrote:
> On Wed, Jan 14, 2026 at 9:57 AM Sean Christopherson <seanjc@...gle.com> wrote:
>>
>> On Wed, Jan 14, 2026, Xiaoyao Li wrote:
>>> On 1/14/2026 8:30 AM, Sagi Shahar wrote:
>>>> From: Vishal Annapurve <vannapurve@...gle.com>
>>>>
>>>> MAPGPA request from TDX VMs gets split into chunks by KVM using a loop
>>>> of userspace exits until the complete range is handled.
>>>>
>>>> In some cases userspace VMM might decide to break the MAPGPA operation
>>>> and continue it later. For example: in the case of intrahost migration
>>>> userspace might decide to continue the MAPGPA operation after the
>>>> migrration is completed
>>
>> migration
>>
>>>> Allow userspace to signal to TDX guests that the MAPGPA operation should
>>>> be retried the next time the guest is scheduled.
>>
>> To Xiaoyao's point, changes like this either need new uAPI, or a detailed
>> explanation in the changelog of why such uAPI isn't deemed necessary.
>>
>>>> Signed-off-by: Vishal Annapurve <vannapurve@...gle.com>
>>>> Co-developed-by: Sagi Shahar <sagis@...gle.com>
>>>> Signed-off-by: Sagi Shahar <sagis@...gle.com>
>>>> ---
>>>> arch/x86/kvm/vmx/tdx.c | 8 +++++++-
>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>>>> index 2d7a4d52ccfb..3244064b1a04 100644
>>>> --- a/arch/x86/kvm/vmx/tdx.c
>>>> +++ b/arch/x86/kvm/vmx/tdx.c
>>>> @@ -1189,7 +1189,13 @@ static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
>>>> struct vcpu_tdx *tdx = to_tdx(vcpu);
>>>> if (vcpu->run->hypercall.ret) {
>>>> - tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
>>>> + if (vcpu->run->hypercall.ret == -EBUSY)
>>>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
>>>> + else if (vcpu->run->hypercall.ret == -EINVAL)
>>>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
>>>> + else
>>>> + return -EINVAL;
>>>
>>> It's incorrect to return -EINVAL here.
>>
>> It's not incorrect, just potentially a breaking change.
>>
>>> The -EINVAL will eventually be
>>> returned to userspace for the VCPU_RUN ioctl. It certainly breaks userspace.
>>
>> It _might_ break userspace. It certainly changes KVM's ABI, but if no userspace
>> actually utilizes the existing ABI, then userspace hasn't been broken.
>>
>> And unless I'm missing something, QEMU _still_ doesn't set hypercall.ret. E.g.
>> see this code in __tdx_map_gpa().
>>
>> /*
>> * In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2)
>> * assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that
>> * it was always zero on KVM_EXIT_HYPERCALL. Since KVM is now overwriting
>> * vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU.
>> */
>> tdx->vcpu.run->hypercall.ret = 0;
>>
>> AFAICT, QEMU kills the VM if anything goes wrong.
>>
>> So while I initially had the exact same reaction of "this is a breaking change
>> and needs to be opt-in", we might actually be able to get away with just making
>> the change (assuming no other VMMs care, or are willing to change themselves).
>
> Is there a better source of truth for whether QEMU uses hypercall.ret
> or just point to this comment in the commit message.
No version of QEMU touches hypercall.ret, from the source code.
I suggest not mentioning the comment, because it only tells QEMU expects
vcpu->run->hypercall.ret to be 0 on KVM_EXIT_HYPERCALL. What matters is
QEMU never sets vcpu->run->hypercall.ret to a non-zero value after
handling KVM_EXIT_HYPERCALL. I think you can just describe the fact that
QEMU never set vcpu->run->hypercall.ret to a non-zero value in the
commit message.
Powered by blists - more mailing lists