[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b863492fd33dce28a3a61662d649987b7d5066d.camel@linux.ibm.com>
Date: Wed, 01 Sep 2021 09:45:28 -0700
From: James Bottomley <jejb@...ux.ibm.com>
To: David Hildenbrand <david@...hat.com>,
Andy Lutomirski <luto@...nel.org>,
Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, kvm list <kvm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Borislav Petkov <bp@...en8.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <jroedel@...e.de>,
Andi Kleen <ak@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Varad Gautam <varad.gautam@...e.com>,
Dario Faggioli <dfaggioli@...e.com>,
the arch/x86 maintainers <x86@...nel.org>,
linux-mm@...ck.org, linux-coco@...ts.linux.dev,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>,
Yu Zhang <yu.c.zhang@...ux.intel.com>
Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest
private memory
On Wed, 2021-09-01 at 18:37 +0200, David Hildenbrand wrote:
> On 01.09.21 18:31, James Bottomley wrote:
> > On Wed, 2021-09-01 at 18:22 +0200, David Hildenbrand wrote:
> > > On 01.09.21 18:18, James Bottomley wrote:
> > > > On Wed, 2021-09-01 at 08:54 -0700, Andy Lutomirski wrote:
> > > > [...]
> > > > > If you want to swap a page on TDX, you can't. Sorry, go
> > > > > directly
> > > > > to jail, do not collect $200.
> > > >
> > > > Actually, even on SEV-ES you can't either. You can read the
> > > > encrypted page and write it out if you want, but unless you
> > > > swap it back to the exact same physical memory location, the
> > > > encryption key won't work. Since we don't guarantee this for
> > > > swap, I think swap won't actually work for any confidential
> > > > computing environment.
> > > >
> > > > > So I think there are literally zero code paths that currently
> > > > > call try_to_unmap() that will actually work like that on
> > > > > TDX. If we run out of memory on a TDX host, we can kill the
> > > > > guest completely and reclaim all of its memory (which
> > > > > probably also involves killing QEMU or whatever other user
> > > > > program is in charge), but that's really our only option.
> > > >
> > > > I think our only option for swap is guest co-operation. We're
> > > > going to have to inflate a balloon or something in the guest
> > > > and have the guest driver do some type of bounce of the page,
> > > > where it becomes an unencrypted page in the guest (so the host
> > > > can read it without the physical address keying of the
> > > > encryption getting in the way) but actually encrypted with a
> > > > swap transfer key known only to the guest. I assume we can use
> > > > the page acceptance infrastructure currently being discussed
> > > > elsewhere to do swap back in as well ... the host provides the
> > > > guest with the encrypted swap page and the guest has to decrypt
> > > > it and place it in encrypted guest memory.
> > >
> > > Ballooning is indeed *the* mechanism to avoid swapping in the
> > > hypervisor and much rather let the guest swap. Shame it requires
> > > trusting a guest, which we, in general, can't. Not to mention
> > > other issues we already do have with ballooning (latency, broken
> > > auto-ballooning, over-inflating, ...).
> >
> > Well not necessarily, but it depends how clever we want to get. If
> > you look over on the OVMF/edk2 list, there's a proposal to do guest
> > migration via a mirror VM that invokes a co-routine embedded in the
> > OVMF binary:
>
> Yes, I heard of that. "Interesting" design.
Heh, well what other suggestion do you have? The problem is there
needs to be code somewhere to perform some operations that's trusted by
both the guest and the host. The only element for a confidential VM
that has this shared trust is the OVMF firmware, so it seems logical to
use it.
>
> > https://patchew.org/EDK2/20210818212048.162626-1-tobin@linux.ibm.com/
> >
> > This gives us a page encryption mechanism that's provided by the
> > host but accepted via the guest using attestation, meaning we have
> > a mutually trusted piece of code that can use to extract encrypted
> > pages. It does seem it could be enhanced to do swapping for us as
> > well if that's a road we want to go down?
>
> Right, but that's than no longer ballooning, unless I am missing
> something important. You'd ask the guest to export/import, and you
> can trust it. But do we want to call something like that out of
> random kernel context when swapping/writeback, ...? Hard to tell.
> Feels like it won't win in a beauty contest.
What I was thinking is that OVMF can emulate devices in this trusted
code ... another potential use for it is a trusted vTPM for SEV-SNP so
we can do measured boot. To use it we'd give the guest kernel some
type of virtual swap driver that attaches to this OVMF device. I
suppose by the time we've done this, it really does look like a
balloon, but I'd like to think of it more as a paravirt memory
controller since it might be used to make a guest more co-operative in
a host overcommit situation.
That's not to say we *should* do this, merely that it doesn't have to
look like a pig with lipstick.
James
Powered by blists - more mailing lists