linux-kernel - Re: [PATCH v19 007/130] x86/virt/tdx: Export SEAMCALL functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7otbchwoxaaqxoxjfqmifma27dmxxo4wlczyee5pv2ussguwyw@uqr2jbmawg6b>
Date: Fri, 19 Apr 2024 17:46:02 +0300
From: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@...el.com>, 
	Tina Zhang <tina.zhang@...el.com>, Dave Hansen <dave.hansen@...el.com>, 
	Hang Yuan <hang.yuan@...el.com>, "x86@...nel.org" <x86@...nel.org>, 
	Kai Huang <kai.huang@...el.com>, Bo2 Chen <chen.bo@...el.com>, 
	"sagis@...gle.com" <sagis@...gle.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, Erdem Aktas <erdemaktas@...gle.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH v19 007/130] x86/virt/tdx: Export SEAMCALL functions

On Thu, Apr 18, 2024 at 11:26:11AM -0700, Sean Christopherson wrote:
> On Thu, Apr 18, 2024, kirill.shutemov@...ux.intel.com wrote:
> > On Tue, Apr 16, 2024 at 07:45:18PM +0000, Edgecombe, Rick P wrote:
> > > On Wed, 2024-04-10 at 15:49 +0300, Kirill A. Shutemov wrote:
> > > > On Fri, Mar 15, 2024 at 09:33:20AM -0700, Sean Christopherson wrote:
> > > > > So my feedback is to not worry about the exports, and instead focus on
> > > > > figuring
> > > > > out a way to make the generated code less bloated and easier to read/debug.
> > > > 
> > > > I think it was mistake trying to centralize TDCALL/SEAMCALL calls into
> > > > few megawrappers. I think we can get better results by shifting leaf
> > > > function wrappers into assembly.
> > > > 
> > > > We are going to have more assembly, but it should produce better result.
> > > > Adding macros can help to write such wrapper and minimizer boilerplate.
> > > > 
> > > > Below is an example of how it can look like. It's not complete. I only
> > > > converted TDCALLs, but TDVMCALLs or SEAMCALLs. TDVMCALLs are going to be
> > > > more complex.
> > > > 
> > > > Any opinions? Is it something worth investing more time?
> > > 
> > > We discussed offline how implementing these for each TDVM/SEAMCALL increases the
> > > chances of a bug in just one TDVM/SEAMCALL. Which could making debugging
> > > problems more challenging. Kirill raised the possibility of some code generating
> > > solution like cpufeatures.h, that could take a spec and generate correct calls.
> > > 
> > > So far no big wins have presented themselves. Kirill, do we think the path to
> > > move the messy part out-of-line will not work?
> > 
> > I converted all TDCALL and TDVMCALL leafs to direct assembly wrappers.
> > Here's WIP branch: https://github.com/intel/tdx/commits/guest-tdx-asm/
> > 
> > I still need to clean it up and write commit messages and comments for all
> > wrappers.
> > 
> > Now I think it worth the shot.
> > 
> > Any feedback?
> 
> I find it hard to review for correctness, and extremely susceptible to developer
> error.  E.g. lots of copy+paste, and manual encoding of RCX to expose registers.

Yes, I agree. The approach requires careful manual work and is error-prone.
I was planning to get around and stare at every wrapper to make sure it is correct.
This approach is not scalable, though.

> It also bleeds TDX ABI into C code, e.g.
> 
> 	/*
> 	 * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of
> 	 * EAX, EBX, ECX, EDX registers after the CPUID instruction execution.
> 	 * So copy the register contents back to pt_regs.
> 	 */
> 	regs->ax = args.r12;
> 	regs->bx = args.r13;
> 	regs->cx = args.r14;
> 	regs->dx = args.r15;
> 
> Oh, and it requires input/output paramters, which is quite gross for C code *and*
> for assembly code, e.g.
> 
> 	u64 tdvmcall_map_gpa(u64 *gpa, u64 size);
> 
> and then the accompanying assembly code:
> 
> 		FRAME_BEGIN
> 
> 		save_regs r12,r13
> 
> 		movq	(%rdi), %r12
> 		movq	%rsi, %r13
> 
> 		movq	$(TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13), %rcx
> 
> 		tdvmcall	$TDVMCALL_MAP_GPA
> 
> 		movq	%r11, (%rdi)
> 
> 		restore_regs r13,r12
> 
> 		FRAME_END
> 		RET
> 
> I think having one trampoline makes sense, e.g. to minimize the probability of
> leaking register state to the VMM.  The part that I don't like, and which generates
> awful code, is shoving register state into a memory structure.

I don't think we can get away with single trampoline. We have outliers.

See TDG.VP.VMCALL<ReportFatalError> that uses pretty much all registers as
input. And I hope we wouldn't need TDG.VP.VMCALL<Instruction.PCONFIG> any
time soon. It uses all possible output registers.

But I guess we can make a *few* wrappers that covers all needed cases.

> The annoying part with the TDX ABI is that it heavily uses r8-r15, and asm()
> constraints don't play nice with r8-15.  But that doesn't mean we can't use asm()
> with macros, it just means we have to play games with registers.
> 
> Because REG-REG moves are super cheap, and ignoring the fatal error goofiness,
> there are at most four inputs.  That means having a single trampoline take *all*
> possible inputs is a non-issue.  And we can avoiding polluting the inline code if
> we bury the register shuffling in the trampoline.
> 
> And if we use asm() wrappers to call the trampoline, then the trampoline doesn't
> need to precisely follow the C calling convention.  I.e. the trampoline can return
> with the outputs still in r12-r15, and let the asm() wrappers extract the outputs
> they want.
> 
> As it stands, TDVMCALLs either have 0, 1, or 4 outputs.  I.e. we only need three
> asm() wrappers.  We could get away with one wrapper, but then users of the wrappers
> would need dummy variables for inputs *and* outputs, and the outputs get gross.
> 
> Completely untested, but this is what I'm thinking.  Side topic, I think making
> "tdcall" a macro that takes a leaf is a mistake.  If/when an assembler learns what
> tdcall is, we're going to have to rewrite all of that code.  And what a coincidence,
> my suggestion needs a bare TDCALL!  :-)

I guess rename the macro (if it still needed) to something like
tdcall_leaf or something.

> Side topic #2, I don't think the trampoline needs a stack frame, its a leaf function.

Hm. I guess.

> Side topic #3, the ud2 to induce panic should be out-of-line.

Yeah. I switched to the inline one while debugging one section mismatch
issue and forgot to switch back.

> Weird?  Yeah.  But at least we one need to document one weird calling convention,
> and the ugliness is contained to three macros and a small assembly function.

Okay, the approach is worth exploring. I can work on it.

You focuses here on TDVMCALL. What is your take on the rest of TDCALL?

> .pushsection .noinstr.text, "ax"
> SYM_FUNC_START(tdvmcall_trampoline)
> 	movq	$TDX_HYPERCALL_STANDARD, %r10
> 	movq	%rax, %r11
> 	movq	%rdi, %r12
> 	movq	%rsi, %r13
> 	movq	%rdx, %r14
> 	movq	%rcx, %r15
> 
> 	movq	$(TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15), %rcx
> 
> 	tdcall
> 
> 	testq	%rax, %rax
> 	jnz	.Lpanic
> 
> 	ret
> 
> .Lpanic:
> 	ud2
> SYM_FUNC_END(tdvmcall_trampoline)
> .popsection
> 
> 
> #define TDVMCALL(reason, in1, in2, in3, in4)				\
> ({									\
> 	long __ret;							\
> 									\
> 	asm(								\
> 		"call tdvmcall_trampoline\n\t"				\
> 		"mov %%r10, %0\n\t"					\
> 		: "=r" (__ret)						\
> 		: "D" (in1), "S"(in2), "d"(in3), "c" (in4)		\
> 		: "r12", "r13", "r14", "r15"				\
> 	);								\
> 	__ret;								\
> })
> 
> #define TDVMCALL_1(reason, in1, in2, in3, in4, out1)			\
> ({									\
> 	long __ret;							\
> 									\
> 	asm(								\
> 		"call tdvmcall_trampoline\n\t"				\
> 		"mov %%r10, %0\n\t"					\
> 		"mov %%r12, %1\n\t"					\

It is r11, not r12.

> 		: "=r"(__ret) "=r" (out1)				\
> 		: "a"(reason), "D" (in1), "S"(in2), "d"(in3), "c" (in4)	\
> 		: "r12", "r13", "r14", "r15"				\
> 	);								\
> 	__ret;								\
> })
> 
> #define TDVMCALL_4(reason, in1, in2, in3, in4, out1, out2, out3, out4)	\
> ({									\
> 	long __ret;							\
> 									\
> 	asm(								\
> 		"call tdvmcall_trampoline\n\t"				\
> 		"mov %%r10, %0\n\t"					\
> 		"mov %%r12, %1\n\t"					\
> 		"mov %%r13, %2\n\t"					\
> 		"mov %%r14, %3\n\t"					\
> 		"mov %%r15, %4\n\t"					\
> 		: "=r" (__ret),						\
> 		  "=r" (out1), "=r" (out2), "=r" (out3), "=r" (out4)	\
> 		: "a"(reason), "D" (in1), "S"(in2), "d"(in3), "c" (in4)	\
> 		  [reason] "i" (reason) 				\
> 		: "r12", "r13", "r14", "r15"				\
> 	);								\
> 	__ret;								\
> })
> 
> static int handle_halt(struct ve_info *ve)
> {
> 	if (TDVMCALL(EXIT_REASON_HALT, irqs_disabled(), 0, 0, 0))
> 		return -EIO;
> 
> 	return ve_instr_len(ve);
> }
> 
> void __cpuidle tdx_safe_halt(void)
> {
> 	WARN_ONCE(TDVMCALL(EXIT_REASON_HALT, false, 0, 0, 0),
> 		  "HLT instruction emulation failed");
> }
> 
> static int read_msr(struct pt_regs *regs, struct ve_info *ve)
> {
> 	u64 val;
> 
> 	if (TDVMCALL_1(EXIT_REASON_MSR_READ, regs->cx, 0, 0, 0, val))
> 		return -EIO;
> 
> 	regs->ax = lower_32_bits(val);
> 	regs->dx = upper_32_bits(val);
> 
> 	return ve_instr_len(ve);
> }
> 
> static int write_msr(struct pt_regs *regs, struct ve_info *ve)
> {
> 	u64 val = (u64)regs->dx << 32 | regs->ax;
> 
> 	if (TDVMCALL(EXIT_REASON_MSR_WRITE, regs->cx, val, 0, 0))
> 		return -EIO;
> 
> 	return ve_instr_len(ve);
> }
> static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
> {
> 	/*
> 	 * Only allow VMM to control range reserved for hypervisor
> 	 * communication.
> 	 *
> 	 * Return all-zeros for any CPUID outside the range. It matches CPU
> 	 * behaviour for non-supported leaf.
> 	 */
> 	if (regs->ax < 0x40000000 || regs->ax > 0x4FFFFFFF) {
> 		regs->ax = regs->bx = regs->cx = regs->dx = 0;
> 		return ve_instr_len(ve);
> 	}
> 
> 	if (TDVMCALL_4(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0,
> 		       regs->ax, regs->bx, regs->cx, regs->dx))
> 		return -EIO;
> 
> 	return ve_instr_len(ve);
> }
> 
> static bool mmio_read(int size, u64 gpa, u64 *val)
> {
> 	*val = 0;
> 	return !TDVMCALL_1(EXIT_REASON_EPT_VIOLATION, size, EPT_READ, gpa, 0, val);
> }
> 
> static bool mmio_write(int size, u64 gpa, u64 val)
> {
> 	return !TDVMCALL(EXIT_REASON_EPT_VIOLATION, size, EPT_WRITE, gpa, val);
> }

-- 
  Kiryl Shutsemau / Kirill A. Shutemov