[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAYXXYw1YpZx1AaOu0TgR9yR9Foi6_jh8XkbGU4ZM2TFTM=nSA@mail.gmail.com>
Date: Mon, 7 Nov 2022 14:53:37 -0800
From: Erdem Aktas <erdemaktas@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: "Nakajima, Jun" <jun.nakajima@...el.com>,
Guorui Yu <GuoRui.Yu@...ux.alibaba.com>,
kirill.shutemov@...ux.intel.com, ak@...ux.intel.com, bp@...en8.de,
dan.j.williams@...el.com, david@...hat.com,
elena.reshetova@...el.com, hpa@...or.com,
linux-kernel@...r.kernel.org, luto@...nel.org, mingo@...hat.com,
peterz@...radead.org, sathyanarayanan.kuppuswamy@...ux.intel.com,
seanjc@...gle.com, tglx@...utronix.de, thomas.lendacky@....com,
x86@...nel.org
Subject: Re: [PATCH 2/2] x86/tdx: Do not allow #VE due to EPT violation on the
private memory
On Fri, Nov 4, 2022 at 3:50 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 11/4/22 15:36, Erdem Aktas wrote:
> > On Fri, Oct 28, 2022 at 7:12 AM Kirill A. Shutemov
> > <kirill.shutemov@...ux.intel.com> wrote:
> >> + *
> >> + * Kernel has no legitimate use-cases for #VE on private memory. It is
> >> + * either a guest kernel bug (like access of unaccepted memory) or
> >> + * malicious/buggy VMM that removes guest page that is still in use.
> >> + *
> >
> > I think this statement is too strong and I have few concerns on this approach.
> > I understand that there is an issue of handling #VEs on private pages
> > but it seems like we are just hiding the problem with this approach
> > instead of fixing it - I do not have any fix in my mind- .
> > First there is a feature of injecting #VE to handle unaccepted pages
> > at runtime and accept them on-demand, now the statement is saying this
> > was an unnecessary feature (why is it there at all then?) at all as
> > there is no legitimate use case.
>
> We're doing on-demand page acceptance. We just don't need a #VE to
> drive it. Why is it in the TDX module then? Inertia? Because it got
> too far along in the process before anyone asked me or some of the other
> x86 kernel folks to look at it hard.
>
> > I wonder if this will limit how we can implement the lazy TDACCEPT.
> > There are multiple ideas floating now.
> > https://github.com/intel/tdx/commit/9b3ef9655b695d3c67a557ec016487fded8b0e2b
> > has 3 implementation choices where "Accept a block of memory on the
> > first use." option is implemented. Actually it says "Accept a block
> > of memory on the first use." but it is implemented as "Accept a block
> > of memory on the first allocation". The comments in this code also
> > raises concerns on the performance.
> >
> > As of now, we do not know which one of those ideas will provide an
> > acceptable performance for booting large size VMs. If the performance
> > overhead is high, we can always implement the lazy TDACCEPT as when
> > the first time a guest accesses an unaccepted memory, #VE can do the TDACCEPT.
>
> Could you please elaborate a bit on what you think the distinction is
> between:
>
> * Accept on first use
> and
> * Accept on allocation
>
> Surely, for the vast majority of memory, it's allocated and then used
> pretty quickly. As in, most allocations are __GFP_ZERO so they're
> allocated and "used" before they even leave the allocator. So, in
> practice, they're *VERY* close to equivalent.
>
> Where do you see them diverging? Why does it matter?
>
For a VM with a very large memory size, let's say close to 800G of
memory, it might take a really long time to finish the initialization.
If all allocations are __GFP_ZERO, then I agree it would not matter
but -- I need to run some benchmarks to validate -- what I remember
was, that was not what we were observing. Let me run a few tests to
provide more input on this but meanwhile if you have already run some
benchmarks, that would be great.
What I see in the code is that the "accept_page" function will zero
all the unaccepted pages even if the __GFP_ZERO flag is not set and if
__GFP_ZERO is set, we will again zero all those pages. I see a lot of
concerning comments like "Page acceptance can be very slow.".
What I mean with "Accept on allocation" is leaving the memory
allocation as it is and using the #VE handler to accept pages the
first time they have been accessed.
tLet me come back with some numbers on this which might take some time
to collect.
> > I am not trying to solve the lazy TDACCEPT problem here but all I am
> > trying to say is that, there might be legitimate use cases for #VE on
> > private memory and this patch limits any future improvement we might
> > need to do on lazy TDACCEPT implementation.
>
> The kernel can't take exceptions on arbitrary memory accesses. I have
> *ZERO* idea how to handle page acceptance on an access to a per-cpu
> variable referenced in syscall entry, or the NMI stack when we've
> interrupted kernel code with a user GSBASE value.
>
> So, we either find *ALL* the kernel memory that needs to be pre-accepted
> at allocation time (like kernel stacks) or we just say that all
> allocated memory needs to be accepted before we let it be allocated.
>
> One of those is really easy. The other means a boatload of code audits.
> I know. I had to do that kind of exercise to get KPTI to work. I
> don't want to do it again. It was worth it for KPTI when the world was
> on fire. TDX isn't that important IMNHO. There's an easier way.
Powered by blists - more mailing lists