linux-kernel - Re: [PATCHv2 04/29] x86/traps: Add #VE support for TDX guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220212014252.un2hajlml7t46p6s@black.fi.intel.com>
Date:   Sat, 12 Feb 2022 04:42:52 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     mingo@...hat.com, bp@...en8.de, dave.hansen@...el.com,
        luto@...nel.org, peterz@...radead.org,
        sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
        ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
        hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
        joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
        pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
        tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
        x86@...nel.org, linux-kernel@...r.kernel.org,
        Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [PATCHv2 04/29] x86/traps: Add #VE support for TDX guest

On Tue, Feb 01, 2022 at 10:02:41PM +0100, Thomas Gleixner wrote:
> > +/*
> > + * Virtualization Exceptions (#VE) are delivered to TDX guests due to
> > + * specific guest actions which may happen in either user space or the
> > + * kernel:
> > + *
> > + *  * Specific instructions (WBINVD, for example)
> > + *  * Specific MSR accesses
> > + *  * Specific CPUID leaf accesses
> > + *  * Access to unmapped pages (EPT violation)
> > + *
> > + * In the settings that Linux will run in, virtualization exceptions are
> > + * never generated on accesses to normal, TD-private memory that has been
> > + * accepted.
> > + *
> > + * Syscall entry code has a critical window where the kernel stack is not
> > + * yet set up. Any exception in this window leads to hard to debug issues
> > + * and can be exploited for privilege escalation. Exceptions in the NMI
> > + * entry code also cause issues. Returning from the exception handler with
> > + * IRET will re-enable NMIs and nested NMI will corrupt the NMI stack.
> > + *
> > + * For these reasons, the kernel avoids #VEs during the syscall gap and
> > + * the NMI entry code. Entry code paths do not access TD-shared memory,
> > + * MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves
> > + * that might generate #VE.
> 
> How is that enforced or validated? What checks for a violation of that
> assumption?

Hm. I think we would have to rely on code audit for it.

Entry code has no #VE inducing things: no port I/O, CPUID, HLT,
MONITOR/MWAIT, WBINVD/INVD, HLT, VMCALL.

There's single MSR read for MSR_GS_BASE paranoid_entry(), but it doesn't
trigger #VE either.

Other possible source of #VE is shared memory. If somebody tricks kernel
to access shared memory from entry code we have a bigger problem to deal
with than #VE in syscall gap.

Or do you have something more strict than code audit in mind? I don't see
it.

-- 
 Kirill A. Shutemov