linux-kernel - Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YqObJ4v2k7W+O2j9@google.com>
Date:   Fri, 10 Jun 2022 19:27:35 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Chao Peng <chao.p.peng@...ux.intel.com>,
        Quentin Perret <qperret@...gle.com>,
        Steven Price <steven.price@....com>,
        kvm list <kvm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        Linux API <linux-api@...r.kernel.org>, qemu-devel@...gnu.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Hugh Dickins <hughd@...gle.com>,
        Jeff Layton <jlayton@...nel.org>,
        "J . Bruce Fields" <bfields@...ldses.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...nel.org>,
        "Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
        Vlastimil Babka <vbabka@...e.cz>,
        Vishal Annapurve <vannapurve@...gle.com>,
        Yu Zhang <yu.c.zhang@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>
Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM
 guest private memory

On Fri, Jun 10, 2022, Andy Lutomirski wrote:
> On Mon, Apr 25, 2022 at 1:31 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Mon, Apr 25, 2022, Andy Lutomirski wrote:
> > >
> > >
> > > On Mon, Apr 25, 2022, at 6:40 AM, Chao Peng wrote:
> > > > On Sun, Apr 24, 2022 at 09:59:37AM -0700, Andy Lutomirski wrote:
> > > >>
> > >
> > > >>
> > > >> 2. Bind the memfile to a VM (or at least to a VM technology).  Now it's in
> > > >> the initial state appropriate for that VM.
> > > >>
> > > >> For TDX, this completely bypasses the cases where the data is prepopulated
> > > >> and TDX can't handle it cleanly.
> >
> > I believe TDX can handle this cleanly, TDH.MEM.PAGE.ADD doesn't require that the
> > source and destination have different HPAs.  There's just no pressing need to
> > support such behavior because userspace is highly motivated to keep the initial
> > image small for performance reasons, i.e. burning a few extra pages while building
> > the guest is a non-issue.
> 
> Following up on this, rather belatedly.  After re-reading the docs,
> TDX can populate guest memory using TDH.MEM.PAGE.ADD, but see Intel®
> TDX Module Base Spec v1.5, section 2.3, step D.4 substeps 1 and 2
> here:
> 
> https://www.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1.5-base-spec-348549001.pdf
> 
> For each TD page:
> 
> 1. The host VMM specifies a TDR as a parameter and calls the
> TDH.MEM.PAGE.ADD function. It copies the contents from the TD
> image page into the target TD page which is encrypted with the TD
> ephemeral key. TDH.MEM.PAGE.ADD also extends the TD
> measurement with the page GPA.
> 
> 2. The host VMM extends the TD measurement with the contents of
> the new page by calling the TDH.MR.EXTEND function on each 256-
> byte chunk of the new TD page.
> 
> So this is a bit like SGX.  There is a specific series of operations
> that have to be done in precisely the right order to reproduce the
> intended TD measurement.  Otherwise the guest will boot and run until
> it tries to get a report and then it will have a hard time getting
> anyone to believe its report.
> 
> So I don't think the host kernel can get away with host userspace just
> providing pre-populated memory.  Userspace needs to tell the host
> kernel exactly what sequence of adds, extends, etc to perform and in
> what order, and the host kernel needs to do precisely what userspace
> asks it to do.  "Here's the contents of memory" doesn't cut it unless
> the tooling that builds the guest image matches the exact semantics
> that the host kernel provides.

For TDX, yes, a KVM ioctl() is mandatory for all intents and purposes since adding
non-zero memory into the guest requires a SEAMCALL.  My "idea", which I'm not sure
would actually work, is more than a bit contrived, and which I don't think is remotely
critical to support, is to let userspace fill the guest private memory directly
and then use the private page for both the source and the target to TDH.MEM.PAGE.ADD.

That would avoid having to double allocate memory for the initial guest image.  But
like I said, contrived and low priority.