[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yc30LewCB5u6lVpW@zn.tnic>
Date: Thu, 30 Dec 2021 19:02:21 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
tglx@...utronix.de, mingo@...hat.com, dave.hansen@...el.com,
luto@...nel.org, peterz@...radead.org,
sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
x86@...nel.org, linux-kernel@...r.kernel.org,
Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [PATCH 04/26] x86/traps: Add #VE support for TDX guest
On Thu, Dec 30, 2021 at 06:41:27PM +0300, Kirill A. Shutemov wrote:
> The updated commit message is below. Let me know if something is unclear.
>
> ----------------------------8<-------------------------------------------
>
> Virtualization Exceptions (#VE) are delivered to TDX guests due to
> specific guest actions which may happen in either user space or the
> kernel:
>
> * Specific instructions (WBINVD, for example)
> * Specific MSR accesses
> * Specific CPUID leaf accesses
> * Access to unmapped pages (EPT violation)
>
> In the settings that Linux will run in, virtual exceptions are never
virtualization exceptions
> generated on accesses to normal, TD-private memory that has been
> accepted.
>
> Syscall entry code has a critical window where the kernel stack is not
> yet set up. Any exception in this window leads to hard to debug issues
> and can be exploited for privilege escalation. Exceptions in the NMI
> entry code also cause issues. IRET from the exception handle will
"Returning from the exception handler with IRET will... "
> re-enable NMIs and nested NMI will corrupt the NMI stack.
>
> For these reasons, the kernel avoids #VEs during the syscall gap and
> the NMI entry code. Entry code paths do not access TD-shared memory,
> MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves
> that might generate #VE. VMM can remove memory from TD at any point,
> but access to unaccepted (or missing) private memory leads to VM
> termination, not to #VE.
>
> Similarly, to page faults and breakpoints, #VEs are allowed in NMI
"Similarly to" - no comma.
> handlers once the kernel is ready to deal with nested NMIs.
>
> During #VE delivery, all interrupts, including NMIs, are blocked until
> TDGETVEINFO is called. It prevents #VE nesting until the kernel reads
> the VE info.
>
> If a guest kernel action which would normally cause a #VE occurs in
> the interrupt-disabled region before TDGETVEINFO, a #DF (fault
> exception) is delivered to the guest which will result in an oops.
That up to here can go over the #VE handler.
> Add basic infrastructure to handle any #VE which occurs in the kernel
> or userspace. Later patches will add handling for specific #VE
> scenarios.
>
> For now, convert unhandled #VE's (everything, until later in this
> series) so that they appear just like a #GP by calling the
> ve_raise_fault() directly. The ve_raise_fault() function is similar
> to #GP handler and is responsible for sending SIGSEGV to userspace
> and CPU die and notifying debuggers and other die chain users.
Yap, better.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists