linux-kernel - Re: [PATCH] x86/tdx: Generate SIGBUS on userspace MMIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ajxusvtisufqbl644hkdnq6c3tpamoppvjifhbj4nqctl4vyt7@nhak7gkergqf>
Date: Mon, 10 Jun 2024 17:31:14 +0300
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>, linux-coco@...ts.linux.dev, 
	linux-kernel@...r.kernel.org, cho@...rosoft.com, decui@...rosoft.com, 
	John.Starks@...rosoft.com
Subject: Re: [PATCH] x86/tdx: Generate SIGBUS on userspace MMIO

On Mon, Jun 10, 2024 at 06:55:56AM -0700, Dave Hansen wrote:
> On 5/28/24 03:09, Kirill A. Shutemov wrote:
> > Currently, attempting to perform MMIO from userspace in a TDX guest
> > leads to a warning about an unexpected #VE and SIGSEGV being delivered
> > to the process.
> 
> Does it _always_ result in a #VE?  Or is this only when guests mmap()
> something like from a driver and the host doesn't back the shared memory?

See below.

> > Enlightened userspace may choose to handle MMIO on their own if the
> > kernel does not emulate it.
> > 
> > Handle the EPT_VIOLATION exit reason for userspace and deliver SIGBUS
> > instead of SIGSEGV. SIGBUS is more appropriate for the MMIO situation.
> 
> Is any userspace _actually_ doing this?  Sure, SIGBUS is more
> appropriate but in practice unprepared userspace crashes either way.

Microsoft folks have plans to do this. I don't know if any current code
handles SIGBUS this way.

> > @@ -641,17 +647,20 @@ static int virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
> >  	switch (ve->exit_reason) {
> >  	case EXIT_REASON_CPUID:
> >  		return handle_cpuid(regs, ve);
> > +	case EXIT_REASON_EPT_VIOLATION:
> > +		if (is_private_gpa(ve->gpa))
> > +			panic("Unexpected EPT-violation on private memory.");
> > +
> > +		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)ve->gla);
> > +
> > +		/* Return 0 to avoid incrementing RIP */
> > +		return 0;
> 
> This _really_ needs a comment, probably even a helper function where you
> can actually explain what is going on.
> 
> I could barely remember what this is for today.  There's no hope for me
> in a couple of years.
> 
> Just thinking through the possibilities here:
> 
> Private=> Private      	: no #VE
> Private=> Anything else	: fatal shutdown
> Shared => Shared	: no #VE
> Shared => Private	: #VE (end up here)
> Shared => !Present      : #VE (end up here)

Are you talking about page state vs. page mapping here?

It is wrong frame to think about it.

We get here as result of EPT violation. Either shared or secret EPT.
So we don't have an present EPT entry for the page or allowed permissions
doesn't match the access.

The is_private_gpa() check catches cases when private EPT doesn't have a
valid entry for page accessed via private mapping: page is not accepted or
removed by VMM. This case is only reachable for debug-TD. In for non-debug
TD it leads to unrecoverable TD exit. The same story as for kernel
addresses.

Normal shared memory doesn't cause #VE even if the memory was not
converted to shared explicitly. On the first access to a page via shared
mapping VMM will allocate a new page and fill EPT[*]. Except for GPA
ranges dedicated for MMIO. For these VMM will not fill EPT and it causes
#VE which interpreted as MMIO access.

Note that only emulated devices require such mechanism to get MMIO
working. For device passthrough, device MMIO range mapped directly into
the guest and handled transparently for the guest kernel.

Does it make sense?

[*] I don't like this implicit conversion to shared. I would prefer #VE
here too, but it is what we have.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov