linux-kernel - Re: [RFC v1 12/26] x86/tdx: Handle in-kernel MMIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b0cb3fc-87bc-ae18-1a26-f6ad45a56fb5@intel.com>
Date:   Thu, 1 Apr 2021 15:53:49 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Raj Ashok <ashok.raj@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC v1 12/26] x86/tdx: Handle in-kernel MMIO

On 4/1/21 3:26 PM, Sean Christopherson wrote:
> On Thu, Apr 01, 2021, Dave Hansen wrote:
>> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
>>> From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
>>>
>>> Handle #VE due to MMIO operations. MMIO triggers #VE with EPT_VIOLATION
>>> exit reason.
>>>
>>> For now we only handle subset of instruction that kernel uses for MMIO
>>> oerations. User-space access triggers SIGBUS.
>> ..
>>> +	case EXIT_REASON_EPT_VIOLATION:
>>> +		ve->instr_len = tdx_handle_mmio(regs, ve);
>>> +		break;
>>
>> Is MMIO literally the only thing that can cause an EPT violation for TDX
>> guests?
> 
> Any EPT Violation, or specifically EPT Violation #VE?  Any memory access can
> cause an EPT violation, but the VMM will get the ones that lead to VM-Exit.  The
> guest will only get the ones that cause #VE.

I'll rephrase: Is MMIO literally the only thing that can cause us to get
into the EXIT_REASON_EPT_VIOLATION case of the switch() here?

> Assuming you're asking about #VE... No, any shared memory access can take a #VE
> since the VMM controls the shared EPT tables and can clear the SUPPRESS_VE bit 
> at any time.  But, if the VMM is friendly, #VE should be limited to MMIO.

OK, but what are we doing in the case of unfriendly VMMs?  What does
*this* code do as-is, and where do we want to take it?

>From the _looks_ of this patch, tdx_handle_mmio() is the be all end all
solution to all EXIT_REASON_EPT_VIOLATION events.

>> But for an OS where we have source for the *ENTIRE* thing, and where we
>> have a chokepoint for MMIO accesses (arch/x86/include/asm/io.h), it
>> seems like an *AWFUL* idea to:
>> 1. Have the kernel set up special mappings for I/O memory
>> 2. Kernel generates special instructions to access that memory
>> 3. Kernel faults on that memory
>> 4. Kernel cracks its own special instructions to see what they were
>>    doing
>> 5. Kernel calls up to host to do the MMIO
>>
>> Instead of doing 2/3/4, why not just have #2 call up to the host
>> directly?  This patch seems a very slow, roundabout way to do
>> paravirtualized MMIO.
>>
>> BTW, there's already some SEV special-casing in io.h.
> 
> I implemented #2 a while back for build_mmio_{read,write}(), I'm guessing the
> code is floating around somewhere.  The gotcha is that there are nasty little
> pieces of the kernel that don't use the helpers provided by io.h, e.g. the I/O
> APIC code likes to access MMIO via a struct overlay, so the compiler is free to
> use any instruction that satisfies the constraint.

So, there aren't an infinite number of these.  It's also 100% possible
to add some tooling to the kernel today to help you find these.  You
could also have added tooling to KVM hosts to help find these.

Folks are *also* saying that we'll need a driver audit just to trust
that drivers aren't vulnerable to attacks from devices or from the host.
 This can quite easily be a part of that effort.

> The I/O APIC can and should be forced off, but dollars to donuts says there are
> more special snowflakes lying in wait.  If the kernel uses an allowlist for
> drivers, then in theory it should be possible to hunt down all offenders.  But
> I think we'll want fallback logic to handle kernel MMIO #VEs, especially if the
> kernel needs ISA cracking logic for userspace.  Without fallback logic, any MMIO
> #VE from the kernel would be fatal, which is too harsh IMO since the behavior
> isn't so obviously wrong, e.g. versus the split lock #AC purge where there's no
> legitimate reason for the kernel to generate a split lock.

I'll buy that this patch is convenient for *debugging*.  It helped folks
bootstrap the TDX support and get it going.

IMNHO, if a driver causes a #VE, it's a bug.  Just like if it goes off
the rails and touches bad memory and #GP's or #PF's.

Are there any printk's in the #VE handler?  Guess what those do.  Print
to the console.  Guess what consoles do.  MMIO.  You can't get away from
doing audits of the console drivers.  Sure, you can go make #VE special,
like NMIs, but that's not going to be fun.  At least the guest doesn't
have to deal with the fatality of a nested #VE, but it's still fatal.

I just don't like us pretending that we're Windows and have no control
over the code we run and throwing up our hands.