linux-kernel - Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220310005145.hzv2lzxgs7uxblfr@black.fi.intel.com>
Date:   Thu, 10 Mar 2022 03:51:45 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        luto@...nel.org, peterz@...radead.org,
        sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
        ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
        hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
        joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
        pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
        tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
        thomas.lendacky@....com, brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO

On Tue, Mar 08, 2022 at 01:26:28PM -0800, Dave Hansen wrote:
> On 3/2/22 06:27, Kirill A. Shutemov wrote:
> > In non-TDX VMs, MMIO is implemented by providing the guest a mapping
> > which will cause a VMEXIT on access and then the VMM emulating the
> > instruction that caused the VMEXIT. That's not possible for TDX VM.
> > 
> > To emulate an instruction an emulator needs two things:
> > 
> >   - R/W access to the register file to read/modify instruction arguments
> >     and see RIP of the faulted instruction.
> > 
> >   - Read access to memory where instruction is placed to see what to
> >     emulate. In this case it is guest kernel text.
> > 
> > Both of them are not available to VMM in TDX environment:
> > 
> >   - Register file is never exposed to VMM. When a TD exits to the module,
> >     it saves registers into the state-save area allocated for that TD.
> >     The module then scrubs these registers before returning execution
> >     control to the VMM, to help prevent leakage of TD state.
> > 
> >   - Memory is encrypted a TD-private key. The CPU disallows software
> >     other than the TDX module and TDs from making memory accesses using
> >     the private key.
> 
> Memory encryption has zero to do with this.  The TDX isolation
> mechanisms are totally discrete from memory encryption, although they
> are "neighbors" of sorts.

Hm. I don't see why you say encryption is not relevant. VMM (host kernel)
has ultimate access to guest memory cypher text. It can read it as
cypher text without any issue (using KeyID-0).

Could you elaborate on the point?

> > == Patching TDX drivers ==
> > 
> > Rather than touching the entire kernel, it might also be possible to
> > just go after drivers that use MMIO in TDX guests.  Right now, that's
> > limited only to virtio and some x86-specific drivers.
> > 
> > All virtio MMIO appears to be done through a single function, which
> > makes virtio eminently easy to patch.
> > 
> > This approach will be adopted in the future, removing the bulk of
> > MMIO #VEs. The #VE-based MMIO will remain serving non-virtio use cases.
> 
> This still doesn't *quite* do it for me for a justification.  Why can't
> the non-virtio cases be converted as well?  Why doesn't the "patching
> MMIO sites" work for x86 code too?
> 
> You really need to convince us that *this* approach will be required
> forever.

What if I add:

	Many drivers can potentially be used inside TDX guest (e.g. via device
	passthough or random device emulation by VMM), but very few will.
	Patching every possible driver is not practical. #VE-based MMIO provides
	functionality for everybody. Performance-critical cases can be optimized
	as needed.

?

> > +static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> > +{
> > +	char buffer[MAX_INSN_SIZE];
> > +	unsigned long *reg, val;
> > +	struct insn insn = {};
> > +	enum mmio_type mmio;
> > +	int size, extend_size;
> > +	u8 extend_val = 0;
> > +
> > +	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
> > +		return false;
> > +
> > +	if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64))
> > +		return false;
> > +
> > +	mmio = insn_decode_mmio(&insn, &size);
> > +	if (WARN_ON_ONCE(mmio == MMIO_DECODE_FAILED))
> > +		return false;
> > +
> > +	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
> > +		reg = insn_get_modrm_reg_ptr(&insn, regs);
> > +		if (!reg)
> > +			return false;
> > +	}
> > +
> > +	ve->instr_len = insn.length;
> > +
> > +	switch (mmio) {
> > +	case MMIO_WRITE:
> > +		memcpy(&val, reg, size);
> > +		return mmio_write(size, ve->gpa, val);
> > +	case MMIO_WRITE_IMM:
> > +		val = insn.immediate.value;
> > +		return mmio_write(size, ve->gpa, val);
> > +	case MMIO_READ:
> > +	case MMIO_READ_ZERO_EXTEND:
> > +	case MMIO_READ_SIGN_EXTEND:
> > +		break;
> > +	case MMIO_MOVS:
> > +	case MMIO_DECODE_FAILED:
> > +		/*
> > +		 * MMIO was accessed with an instruction that could not be
> > +		 * decoded or handled properly. It was likely not using io.h
> > +		 * helpers or accessed MMIO accidentally.
> > +		 */
> > +		return false;
> > +	default:
> > +		/* Unknown insn_decode_mmio() decode value? */
> > +		BUG();
> > +	}
> 
> BUG()s are bad.  The set of insn_decode_mmio() return codes is known at
> compile time.  If we're really on the lookout for unknown values, why
> not just:
> 
> 	BUILD_BUG_ON(NR_MMIO_TYPES != 6); // or whatever

This doesn't work.

We can pretend that the function only forced to return values from the
enum. The truth is that it can return whatever int it wants. Type system
in C is too week to guarantee anything here. The BUG() is backstop for it.

This BUILD_BUG_ON() is useless. Compiler complains about missing case in
the switch anyway.

> Also, there are *lots* of ways for this function to just fall over and
> fail.  Why does this particular failure mode deserve a BUG()?
> 
> Is there a reason a BUG() is better than returning failure which
> presumably sets off the #GP-like logic?

BUG() here makes it clear that the handler itself is buggy. Returning
false and kicking in #GP-like logic indicates that something wrong with
the code that triggered #VE. I think it is an important distinction.

> Also, now that I've read this a few times, I've been confused by the
> same thing a few times.  This is handling instructions that might read
> or write or do both, correct?
> 
> Should that be made explicit in a function comment?

Hm. Okay. Something like

/* Handle reads from and writes to MMIO region. */

before the function?

-- 
 Kirill A. Shutemov