lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 17 Aug 2022 10:27:19 -0500
From:   Michael Roth <michael.roth@....com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     "Gupta, Pankaj" <pankaj.gupta@....com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Chao Peng <chao.p.peng@...ux.intel.com>,
        "Nikunj A. Dadhania" <nikunj@....com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        "Joerg Roedel" <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        <x86@...nel.org>, "H . Peter Anvin" <hpa@...or.com>,
        Hugh Dickins <hughd@...gle.com>,
        Jeff Layton <jlayton@...nel.org>,
        "J . Bruce Fields" <bfields@...ldses.org>,
        "Andrew Morton" <akpm@...ux-foundation.org>,
        Shuah Khan <shuah@...nel.org>,
        "Mike Rapoport" <rppt@...nel.org>,
        Steven Price <steven.price@....com>,
        "Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
        Vlastimil Babka <vbabka@...e.cz>,
        Vishal Annapurve <vannapurve@...gle.com>,
        Yu Zhang <yu.c.zhang@...ux.intel.com>, <luto@...nel.org>,
        <jun.nakajima@...el.com>, <dave.hansen@...el.com>,
        <ak@...ux.intel.com>, <david@...hat.com>, <aarcange@...hat.com>,
        <ddutile@...hat.com>, <dhildenb@...hat.com>,
        "Quentin Perret" <qperret@...gle.com>, <mhocko@...e.com>,
        Muchun Song <songmuchun@...edance.com>, <bharata@....com>,
        <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>, <linux-kselftest@...r.kernel.org>,
        <linux-api@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <qemu-devel@...gnu.org>, <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM
 guest private memory

On Tue, Aug 16, 2022 at 03:38:08PM +0000, Sean Christopherson wrote:
> On Tue, Aug 16, 2022, Gupta, Pankaj wrote:
> > 
> > > > > Actually the current version allows you to delay the allocation to a
> > > > > later time (e.g. page fault time) if you don't call fallocate() on the
> > > > > private fd. fallocate() is necessary in previous versions because we
> > > > > treat the existense in the fd as 'private' but in this version we track
> > > > > private/shared info in KVM so we don't rely on that fact from memory
> > > > > backstores.
> > > > 
> > > > Does this also mean reservation of guest physical memory with secure
> > > > processor (both for SEV-SNP & TDX) will also happen at page fault time?
> > > > 
> > > > Do we plan to keep it this way?
> > > 
> > > If you are talking about accepting memory by the guest, it is initiated by
> > > the guest and has nothing to do with page fault time vs fallocate()
> > > allocation of host memory. I mean acceptance happens after host memory
> > > allocation but they are not in lockstep, acceptance can happen much later.
> > 
> > No, I meant reserving guest physical memory range from hypervisor e.g with
> > RMPUpdate for SEV-SNP or equivalent at TDX side (PAMTs?).
> 
> As proposed, RMP/PAMT updates will occur in the fault path, i.e. there is no way
> for userspace to pre-map guest memory.

Hi Sean,

Currently I have the rmpupdate hook in KVM_MEMORY_ENCRYPT_{REG,UNREG}_REGION
ioctls, so that when the pages actually get faulted in they are already
in the expected state. I have userspace set up to call
KVM_MEMORY_ENCRYPT_* in response to explicit page state changes issued by
the guest, as well as in response to MEMORY_FAULT exits for implicit page
state changes.

Initially the private backing store may or may not be pre-fallocate()'d
depending on how userspace wants to handle it. If it's not
pre-fallocate()'d, then the pages don't get faulted in until the guest
does explicit page state changes (currently SNP guests will do this for all
memory at boot time, but with unaccepted memory patches for guest/ovmf
this will happen during guest run-time, would still allow us to make
efficient use of lazy-pinning support for shorter boot times).

If userspaces wants to pre-allocate, it can issue the fallocate() for
all the ranges up-front so it doesn't incur the cost during run-time.

Is that compatible with the proposed design?

Of course, for the initial encrypted payload, we would need to to issue
the KVM_MEMORY_ENCRYPT_{REG,UNREG}_REGION up-front. I'm doing that in
conjunction with the hack to allow pwrite() to memfd to pre-populate the
private pages before the in-place encryption that occurs when
SNP_LAUNCH_UPDATE is issued...

In the past you and Vishal suggested doing the copy from within
SNP_LAUNCH_UPDATE, which seems like a workable solution and something
we've been meaning to implement...

> 
> I think the best approach is to turn KVM_TDX_INIT_MEM_REGION into a generic
> vCPU-scoped ioctl() that allows userspace to pre-map guest memory.  Supporting
> initializing guest private memory with a source page can be implemented via a
> flag.  That also gives KVM line of sight to in-place "conversion", e.g. another
> flag could be added to say that the dest is also the source.

So is this proposed ioctl only intended to handle the initial encrypted
payload, and the KVM_MEMORY_ENCRYPT_{REG,UNREG}_REGION ioctls would
still be used for conversions post-boot?

If so, that seems reasonable, but I thought there was some consensus that
just handling it per-platform in, e.g., SNP_LAUNCH_UPDATE, was
sufficient for now until some additional need arose for a new interface.
Has something changed in the regard? Just want to understand the
motivations so we can plan accordingly.

Thanks!

-Mike

> 
> The TDX and SNP restrictions would then become addition restrictions on when
> initializing with a source is allowed (and VMs that don't have guest private
> memory wouldn't allow the flag at all).

Powered by blists - more mailing lists