lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Apr 2022 18:30:35 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Quentin Perret <qperret@...gle.com>,
        Steven Price <steven.price@....com>,
        Chao Peng <chao.p.peng@...ux.intel.com>,
        kvm list <kvm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        Linux API <linux-api@...r.kernel.org>, qemu-devel@...gnu.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Hugh Dickins <hughd@...gle.com>,
        Jeff Layton <jlayton@...nel.org>,
        "J . Bruce Fields" <bfields@...ldses.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...nel.org>,
        "Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
        Vlastimil Babka <vbabka@...e.cz>,
        Vishal Annapurve <vannapurve@...gle.com>,
        Yu Zhang <yu.c.zhang@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>
Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM
 guest private memory

On Tue, Apr 05, 2022, Andy Lutomirski wrote:
> On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote:
> > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote:
> >> The best I can come up with is a special type of shared page that is not
> >> GUP-able and maybe not even mmappable, having a clear option for
> >> transitions to fail, and generally preventing the nasty cases from
> >> happening in the first place.
> >
> > Right, that sounds reasonable to me.
> 
> At least as a v1, this is probably more straightforward than allowing mmap().
> Also, there's much to be said for a simpler, limited API, to be expanded if
> genuinely needed, as opposed to starting out with a very featureful API.

Regarding "genuinely needed", IMO the same applies to supporting this at all.
Without numbers from something at least approximating a real use case, we're just
speculating on which will be the most performant approach.

> >> Maybe there could be a special mode for the private memory fds in which
> >> specific pages are marked as "managed by this fd but actually shared".
> >> pread() and pwrite() would work on those pages, but not mmap().  (Or maybe
> >> mmap() but the resulting mappings would not permit GUP.)  And
> >> transitioning them would be a special operation on the fd that is specific
> >> to pKVM and wouldn't work on TDX or SEV.
> >
> > Aha, didn't think of pread()/pwrite(). Very interesting.
> 
> There are plenty of use cases for which pread()/pwrite()/splice() will be as
> fast or even much faster than mmap()+memcpy().

...

> resume guest
> *** host -> hypervisor -> guest ***
> Guest unshares the page.
> *** guest -> hypervisor ***
> Hypervisor removes PTE.  TLBI.
> *** hypervisor -> guest ***
> 
> Obviously considerable cleverness is needed to make a virt IOMMU like this
> work well, but still.
> 
> Anyway, my suggestion is that the fd backing proposal get slightly modified
> to get it ready for multiple subtypes of backing object, which should be a
> pretty minimal change.  Then, if someone actually needs any of this
> cleverness, it can be added later.  In the mean time, the
> pread()/pwrite()/splice() scheme is pretty good.

Tangentially related to getting private-fd ready for multiple things, what about
implementing the pread()/pwrite()/splice() scheme in pKVM itself?  I.e. read() on
the VM fd, with the offset corresponding to gfn in some way.

Ditto for mmap() on the VM fd, though that would require additional changes outside
of pKVM.

That would allow pKVM to support in-place conversions without the private-fd having
to differentiate between the type of protected VM, and without having to provide
new APIs from the private-fd.  TDX, SNP, etc... Just Work by not supporting the pKVM
APIs.

And assuming we get multiple consumers down the road, pKVM will need to be able to
communicate the "true" state of a page to other consumers, because in addition to
being a consumer, pKVM is also an owner/enforcer analogous to the TDX Module and
the SEV PSP.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ