linux-kernel - Re: [PATCH] KVM: TDX: Allow userspace to return errors to guest for MAPGPA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWe8zESCJ0ZeAOT3@google.com>
Date: Wed, 14 Jan 2026 07:57:00 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Xiaoyao Li <xiaoyao.li@...el.com>
Cc: Sagi Shahar <sagis@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Kiryl Shutsemau <kas@...nel.org>, 
	Rick Edgecombe <rick.p.edgecombe@...el.com>, Thomas Gleixner <tglx@...nel.org>, 
	Borislav Petkov <bp@...en8.de>, "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev, 
	Vishal Annapurve <vannapurve@...gle.com>
Subject: Re: [PATCH] KVM: TDX: Allow userspace to return errors to guest for MAPGPA

On Wed, Jan 14, 2026, Xiaoyao Li wrote:
> On 1/14/2026 8:30 AM, Sagi Shahar wrote:
> > From: Vishal Annapurve <vannapurve@...gle.com>
> > 
> > MAPGPA request from TDX VMs gets split into chunks by KVM using a loop
> > of userspace exits until the complete range is handled.
> > 
> > In some cases userspace VMM might decide to break the MAPGPA operation
> > and continue it later. For example: in the case of intrahost migration
> > userspace might decide to continue the MAPGPA operation after the
> > migrration is completed

migration

> > Allow userspace to signal to TDX guests that the MAPGPA operation should
> > be retried the next time the guest is scheduled.

To Xiaoyao's point, changes like this either need new uAPI, or a detailed
explanation in the changelog of why such uAPI isn't deemed necessary.

> > Signed-off-by: Vishal Annapurve <vannapurve@...gle.com>
> > Co-developed-by: Sagi Shahar <sagis@...gle.com>
> > Signed-off-by: Sagi Shahar <sagis@...gle.com>
> > ---
> >   arch/x86/kvm/vmx/tdx.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 2d7a4d52ccfb..3244064b1a04 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1189,7 +1189,13 @@ static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
> >   	struct vcpu_tdx *tdx = to_tdx(vcpu);
> >   	if (vcpu->run->hypercall.ret) {
> > -		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
> > +		if (vcpu->run->hypercall.ret == -EBUSY)
> > +			tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
> > +		else if (vcpu->run->hypercall.ret == -EINVAL)
> > +			tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
> > +		else
> > +			return -EINVAL;
> 
> It's incorrect to return -EINVAL here. 

It's not incorrect, just potentially a breaking change.

> The -EINVAL will eventually be
> returned to userspace for the VCPU_RUN ioctl. It certainly breaks userspace.

It _might_ break userspace.  It certainly changes KVM's ABI, but if no userspace
actually utilizes the existing ABI, then userspace hasn't been broken.

And unless I'm missing something, QEMU _still_ doesn't set hypercall.ret.  E.g.
see this code in __tdx_map_gpa().

	/*
	 * In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2)
	 * assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that
	 * it was always zero on KVM_EXIT_HYPERCALL.  Since KVM is now overwriting
	 * vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU.
	 */
	tdx->vcpu.run->hypercall.ret = 0;

AFAICT, QEMU kills the VM if anything goes wrong.

So while I initially had the exact same reaction of "this is a breaking change
and needs to be opt-in", we might actually be able to get away with just making
the change (assuming no other VMMs care, or are willing to change themselves).

> So it needs to be
> 
> 	if (vcpu->run->hypercall.ret == -EBUSY)
> 		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
> 	else
> 		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);

No, because assuming everything except -EBUSY translates to
TDVMCALL_STATUS_INVALID_OPERAND paints KVM back into the same corner its already
in.  What I care most about is eliminating KVM's assumption that a non-zero
hypercall.ret means TDVMCALL_STATUS_INVALID_OPERAND.

For the new ABI, I see two options:

 1. Translate -errno as done in this patch.
 2. Propagate hypercall.ret directly to the TDVMCALL return code, i.e. let
    userspace set any return code it wants.

#1 has the downside of needing KVM changes and new uAPI every time a new return
code is supported.

#2 has the downside of preventing KVM from establishing its own ABI around the
return code, and making the return code vendor specific.  E.g. if KVM ever wanted
to do something in response to -EBUSY beyond propagating the error to the guest,
then we can't reasonably do that with #2.

Whatever we do, I want to change snp_complete_psc_msr() and snp_complete_one_psc()
in the same patch, so that whatever ABI we establish is common to TDX and SNP.

See also https://lore.kernel.org/all/Zn8YM-s0TRUk-6T-@google.com.

> But I'm not sure if such change breaks the userspace ABI that if needs to be
> opted-in.