linux-kernel - Re: [PATCH v12 20/23] KVM: selftests: Add support for TDX TDCALL from guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQTSdk3JtFu1qOMj@google.com>
Date: Fri, 31 Oct 2025 08:15:15 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ira Weiny <ira.weiny@...el.com>
Cc: Sagi Shahar <sagis@...gle.com>, linux-kselftest@...r.kernel.org, 
	Paolo Bonzini <pbonzini@...hat.com>, Shuah Khan <shuah@...nel.org>, 
	Ackerley Tng <ackerleytng@...gle.com>, Ryan Afranji <afranji@...gle.com>, 
	Andrew Jones <ajones@...tanamicro.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	Erdem Aktas <erdemaktas@...gle.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>, 
	Roger Wang <runanwang@...gle.com>, Binbin Wu <binbin.wu@...ux.intel.com>, 
	Oliver Upton <oliver.upton@...ux.dev>, "Pratik R. Sampat" <pratikrajesh.sampat@....com>, 
	Reinette Chatre <reinette.chatre@...el.com>, Chao Gao <chao.gao@...el.com>, 
	Chenyi Qiang <chenyi.qiang@...el.com>, linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v12 20/23] KVM: selftests: Add support for TDX TDCALL from guest

On Fri, Oct 31, 2025, Ira Weiny wrote:
> Sagi Shahar wrote:
> > From: Erdem Aktas <erdemaktas@...gle.com>
> > 
> > Add support for TDX guests to issue TDCALLs to the TDX module.
> 
> Generally it is nice to have more details.  As someone new to TDX I
> have to remind myself what a TDCALL is.  And any random kernel developer
> reading this in the future will likely have even less clue than me.
> 
> Paraphrased from the spec:
> 
> TDCALL is the instruction used by the guest TD software (in TDX non-root
> mode) to invoke guest-side TDX functions.  TDG.VP.VMCALL helps invoke
> services from the host VMM.
> 
> Add support for TDX guests to invoke services from the host VMM.

Eh, at some point a baseline amount of knowledge is required.  I highly doubt
regurgitating the spec is going to make a huge difference

I also dislike the above wording, because it doesn't help understand _why_ KVM
selftests need to support TDCALL, or _how_ the functionality will be utilized.
E.g. strictly speaking, we could write KVM selftests without ever doing a single
TDG.VP.VMCALL, because we control both sides (guest and VMM).  And I have a hard
time belive name-dropping TDG.VP.VMCALL is going to connect the dots between
TDCALL and the "tunneling" scheme defined by the GHCI for requesting emulation
of "legacy" functionality".

What I would like to know is why selftests are copy-pasting the kernel's scheme
for marshalling data to/from the registers used by TDCALL, how selftests are
expected to utilize TDCALL, etc.  I'm confident that if someone actually took the
time to write a changelog explaining those details, then what TDCALL "is" will
be fairly clear, even if the reader doesn't know exactly what it is.

E.g. IMO this is ugly and lazy on multiple fronts:

uint64_t tdg_vp_vmcall_ve_request_mmio_write(uint64_t address, uint64_t size,
                                            uint64_t data_in)
{
       struct tdx_tdcall_args args = {
               .r10 = TDG_VP_VMCALL,
               .r11 = TDG_VP_VMCALL_VE_REQUEST_MMIO,
               .r12 = size,
               .r13 = MMIO_WRITE,
               .r14 = address,
               .r15 = data_in,
       };

       return __tdx_tdcall(&args, 0);
}

First, these are KVM selftests, there's no need to provide a super fancy namespace
because we are "competing" with thousands upon thousands of lines of code from
other components and subsystems.

Similarly, tdg_vp_vmcall_ve_request_mmio_write() is absurdly verbose.  Referencing
#VE in any way is also flat out wrong.

It's also far too specific to TDX, which is going to be problematic when full
support for SEV-ES+ selftests comes along.  I.e. calling this from common code
is going to be a pain in the rear, bordering on unworkable.

And related to your comment about having enums for the sizes, there's absolutely
zero reason the caller should have to specify the size.

In short, don't simply copy what was done for the kernel.  The kernel is operating
under constraints that do not and should not ever apply to KVM selftests.  Except
for tests like set_memory_region_test.c that delete memslots while a vCPU is running
and thus _may_ generate MMIO accesses, our selftests should never, ever take a #VE
(or #VC) and then request MMIO in the handler.  If a test wants to do MMIO, then
do MMIO.

So, I want to see GUEST_MMIO_WRITE() and GUEST_MMIO_READ(), or probably even just
MMIO_WRITE() and MMIO_READ().  And then under the hood, wire up kvm_arch_mmio_write()
and kvm_arch_mmio_read() in kvm_util_arch.h.  And from there have x86 globally track
if it's TDX, SEV-ES+, or "normal".  That'd also give us a good reason+way to assert
on s390 if a test attempts MMIO, as s390 doesn't support emulated MMIO.

One potential hiccup is if/when KVM selftests get access to actual MMIO, i.e. don't
want to trigger emulation, e.g. for VFIO related selftests when accessing BARs.
Though the answer there is probably to just use WRITE/READ_ONCE() and call it good.

E.g.

#define MMIO_WRITE(addr, val)					\
	kvm_arch_mmio_write(addr, val);

#define kvm_arch_mmio_write(addr, val)				\
({								\
	if (guest_needs_tdvmcall)				\
		tdx_mmio_write(addr, val, sizeof(val));		\
	else if (guest_needs_vmgexit)				\
		sev_mmio_write(addr, val, sizeof(val));		\
	else							\
		WRITE_ONCE(addr, val);				\
})

#define MMIO_READ(addr, val)					\
	kvm_arch_mmio_read(addr, val);

#define kvm_arch_mmio_read(addr, val)				\
({								\
	if (guest_needs_tdvmcall)				\
		tdx_mmio_read(addr, &(val), sizeof(val));	\
	else if (guest_needs_vmgexit)				\
		sev_mmio_write(addr, &(val), sizeof(val));	\
	else							\
		(val) = READ_ONCE(addr);			\
})