linux-kernel - Re: [RFC PATCH 0/9] kvm: implement atomic memslot updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YzSxhHzgNKHL3Cvm@google.com>
Date:   Wed, 28 Sep 2022 20:41:40 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Emanuele Giuseppe Esposito <eesposit@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
        Like Xu <like.xu.linux@...il.com>
Subject: Re: [RFC PATCH 0/9] kvm: implement atomic memslot updates

On Wed, Sep 28, 2022, Paolo Bonzini wrote:
> On 9/28/22 17:58, Sean Christopherson wrote:
> > I don't disagree that the memslots API is lacking, but IMO that is somewhat
> > orthogonal to fixing KVM x86's "code fetch to MMIO" mess.  Such a massive new API
> > should be viewed and prioritized as a new feature, not as a bug fix, e.g. I'd
> > like to have the luxury of being able to explore ideas beyond "let userspace
> > batch memslot updates", and I really don't want to feel pressured to get this
> > code reviewed and merge.
> 
> I absolutely agree that this is not a bugfix.  Most new features for KVM can
> be seen as bug fixes if you squint hard enough, but they're still features.

I guess I'm complaining that there isn't sufficient justification for this new
feature.  The cover letter provides a bug that would be fixed by having batched
updates, but as above, that's really due to deficiencies in a different KVM ABI.

Beyond that, there's no explanation of why this exact API is necessary, i.e. there
are no requirements given.

  - Why can't this be solved in userspace?

  - Is performance a concern?  I.e. are updates that need to be batched going to
    be high frequency operations?

  - What operations does userspace truly need?  E.g. if the only use case is to
    split/truncate/hole punch an existing memslot, can KVM instead provide a
    memslot flag and exit reason that allows kicking vCPUs to userspace if the
    memslot is accessed?  E.g. KVM_MEM_DISABLED that acts like an invalid memslot,
    but KVM exists with a dedicated exit reason instead of generating MMIO semantics.

  - What precisely needs to be atomic?  If updates only need to be "atomic" for
    an address space, does the API allowing mixing non-SMM and SMM memslots?

  - When is KVM required to invalidate and flush?  E.g. if a memslot is deleted
    and recreated with a different HVA, how does KVM ensure that there are no
    outstanding references to the old HVA without introducing non-determinstic
    behavior.  The current API works by forcing userspace to fully delete the
    memslot, i.e. KVM can ensure all references are gone in all TLBs before
    allowing userspace to create new, conflicting entries.  I don't see how this
    can work with batched updates.  The update needs to be "atomic", i.e. vCPUs
    must never see an invalid/deleted memslot, but if the memslot is writable,
    how does KVM prevent some writes from hitting the old HVA and some from hitting
    the new HVA without a quiescent period?

  - If a memslot is truncated while dirty logging is enabled, what happens to
    the bitmap?  Is it preserved?  Dropped?

Again, I completely agree that the current memslots API is far from perfect, but
I'm not convinced that simply extending the existing API to batch updates is the
best solution from a KVM perspective.

> > E.g. why do a batch update and not provide KVM_SET_ALL_USER_MEMORY_REGIONS to
> > do wholesale replacement?  That seems like it would be vastly simpler to handle
> > on KVM's end.  Or maybe there's a solution in the opposite direction, e.g. an
> > API that allows 1->N or N->1 conversions but not arbitrary batching.
> 
> Wholesale replacement was my first idea when I looked at the issue, I think
> at the end of 2020.  I never got to a full implementation, but my impression
> was that allocating/deallocating dirty bitmaps, rmaps etc. would make it any
> easier than arbitrary batch updates.

It's not obvious to me that the memslot metadata is going to be easy to handle
regardless of what we do.  E.g. I'm pretty sure that batching updates will "corrupt"
the dirty bitmap if a hole is punched in a memslot that's being dirty logged.