[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877d8t2ykp.ffs@tglx>
Date: Thu, 17 Mar 2022 01:48:54 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
mingo@...hat.com, bp@...en8.de, dave.hansen@...el.com,
luto@...nel.org, peterz@...radead.org
Cc: sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
thomas.lendacky@....com, brijesh.singh@....com, x86@...nel.org,
linux-kernel@...r.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCHv6 07/30] x86/traps: Add #VE support for TDX guest
On Wed, Mar 16 2022 at 05:08, Kirill A. Shutemov wrote:
> +void tdx_get_ve_info(struct ve_info *ve)
> +{
> + struct tdx_module_output out;
> +
> + /*
> + * Called during #VE handling to retrieve the #VE info from the
> + * TDX module.
> + *
> + * This should called done early in #VE handling. A "nested"
... has to be called early ..
> + * #VE which occurs before this will raise a #DF and is not
> + * recoverable.
> + */
> + tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out);
> +
> + /* Interrupts and NMIs can be delivered again. */
Please put a new line between this comment and the code below because
they are completely unrelated. Also interrupts cannot be delivered here
because this code runs with interrupts disabled ...
And I rather have this comment above the tdx_module_call() invocation:
* recoverable.
*
* <Useful comment about VE info and NMIs>
When I reviewed this last time, Sean provided a very concise comment for
this.
*/
tdx_module_call(...);
/* Transfer the output parameters */
> + ve->exit_reason = out.rcx;
....
Hmm?
> +/*
> + * Virtualization Exceptions (#VE) are delivered to TDX guests due to
> + * specific guest actions which may happen in either user space or the
> + * kernel:
> + *
> + * * Specific instructions (WBINVD, for example)
> + * * Specific MSR accesses
> + * * Specific CPUID leaf accesses
> + * * Access to specific guest physical addresses
> + *
> + * In the settings that Linux will run in, virtualization exceptions are
> + * never generated on accesses to normal, TD-private memory that has been
> + * accepted.
> + *
> + * Syscall entry code has a critical window where the kernel stack is not
> + * yet set up. Any exception in this window leads to hard to debug issues
> + * and can be exploited for privilege escalation. Exceptions in the NMI
> + * entry code also cause issues. Returning from the exception handler with
> + * IRET will re-enable NMIs and nested NMI will corrupt the NMI stack.
> + *
> + * For these reasons, the kernel avoids #VEs during the syscall gap and
> + * the NMI entry code. Entry code paths do not access TD-shared memory,
> + * MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves
> + * that might generate #VE.
I asked that before:
"How is that enforced or validated? What checks for a violation of that
assumption?"
This is still exactly the same comment which is based on testing which
did not yet explode in your face, right?
So what's the point of this blurb? Create expectations which are not
accountable?
The point is that any #VE in such a code path is fatal and you better
come up with some reasonable explanation why this is not the case in
those code pathes and how a potential violation of that assumption might
be detected especially in rarely used corner cases. If such a violation
is not detectable by audit, CI, static code analysis or whatever then
document the consequences instead of pretending that the problem does
not exist and the kernel is perfect today and forever.
Thanks,
tglx
Powered by blists - more mailing lists