lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YGZIks0DsfPS2IMk@google.com>
Date:   Thu, 1 Apr 2021 22:26:26 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Raj Ashok <ashok.raj@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC v1 12/26] x86/tdx: Handle in-kernel MMIO

On Thu, Apr 01, 2021, Dave Hansen wrote:
> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> > 
> > Handle #VE due to MMIO operations. MMIO triggers #VE with EPT_VIOLATION
> > exit reason.
> > 
> > For now we only handle subset of instruction that kernel uses for MMIO
> > oerations. User-space access triggers SIGBUS.
> ..
> > +	case EXIT_REASON_EPT_VIOLATION:
> > +		ve->instr_len = tdx_handle_mmio(regs, ve);
> > +		break;
> 
> Is MMIO literally the only thing that can cause an EPT violation for TDX
> guests?

Any EPT Violation, or specifically EPT Violation #VE?  Any memory access can
cause an EPT violation, but the VMM will get the ones that lead to VM-Exit.  The
guest will only get the ones that cause #VE.

Assuming you're asking about #VE... No, any shared memory access can take a #VE
since the VMM controls the shared EPT tables and can clear the SUPPRESS_VE bit 
at any time.  But, if the VMM is friendly, #VE should be limited to MMIO.

There's also the unaccepted private memory case, but if Linux gets an option to
opt out of that, then #VE is limited to shared memory.

> Forget userspace for a minute.  #VE's from userspace are annoying, but
> fine.  We can't control what userspace does.  If an action it takes
> causes a #VE in the TDX architecture, tough cookies, the kernel must
> handle it and try to recover or kill the app.
> 
> The kernel is very different.  We know in advance (must know,
> actually...) which instructions might cause exceptions of any kind.
> That's why we have exception tables and copy_to/from_user().  That's why
> we can handle kernel page faults on userspace, but not inside spinlocks.
> 
> Binary-dependent OSes are also very different.  It's going to be natural
> for them to want to take existing, signed drivers and use them in TDX
> guests.  They might want to do something like this.
> 
> But for an OS where we have source for the *ENTIRE* thing, and where we
> have a chokepoint for MMIO accesses (arch/x86/include/asm/io.h), it
> seems like an *AWFUL* idea to:
> 1. Have the kernel set up special mappings for I/O memory
> 2. Kernel generates special instructions to access that memory
> 3. Kernel faults on that memory
> 4. Kernel cracks its own special instructions to see what they were
>    doing
> 5. Kernel calls up to host to do the MMIO
> 
> Instead of doing 2/3/4, why not just have #2 call up to the host
> directly?  This patch seems a very slow, roundabout way to do
> paravirtualized MMIO.
> 
> BTW, there's already some SEV special-casing in io.h.

I implemented #2 a while back for build_mmio_{read,write}(), I'm guessing the
code is floating around somewhere.  The gotcha is that there are nasty little
pieces of the kernel that don't use the helpers provided by io.h, e.g. the I/O
APIC code likes to access MMIO via a struct overlay, so the compiler is free to
use any instruction that satisfies the constraint.

The I/O APIC can and should be forced off, but dollars to donuts says there are
more special snowflakes lying in wait.  If the kernel uses an allowlist for
drivers, then in theory it should be possible to hunt down all offenders.  But
I think we'll want fallback logic to handle kernel MMIO #VEs, especially if the
kernel needs ISA cracking logic for userspace.  Without fallback logic, any MMIO
#VE from the kernel would be fatal, which is too harsh IMO since the behavior
isn't so obviously wrong, e.g. versus the split lock #AC purge where there's no
legitimate reason for the kernel to generate a split lock.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ