linux-kernel - Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADrL8HVpPOBkM4tD0xBpxWzYMiAoE5Gcg8Cg0=Xi5mvgeqR0gA@mail.gmail.com>
Date: Tue, 3 Sep 2024 14:40:36 -0700
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Oliver Upton <oliver.upton@...ux.dev>, Marc Zyngier <maz@...nel.org>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking
 rmaps outside of mmu_lock

On Tue, Sep 3, 2024 at 2:27 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Tue, Sep 03, 2024, James Houghton wrote:
> > On Fri, Aug 9, 2024 at 12:44 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > +/*
> > > + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always
> > > + * operates with mmu_lock held for write), but rmaps can be walked without
> > > + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain
> > > + * being zapped/dropped _while the rmap is locked_.
> > > + *
> > > + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be
> > > + * done while holding mmu_lock for write.  This allows a task walking rmaps
> > > + * without holding mmu_lock to concurrently walk the same entries as a task
> > > + * that is holding mmu_lock but _not_ the rmap lock.  Neither task will modify
> > > + * the rmaps, thus the walks are stable.
> > > + *
> > > + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED,
> > > + * only the rmap chains themselves are protected.  E.g. holding an rmap's lock
> > > + * ensures all "struct pte_list_desc" fields are stable.
> > > + */
> > > +#define KVM_RMAP_LOCKED        BIT(1)
> > > +
> > > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head)
> > > +{
> > > +       unsigned long old_val, new_val;
> > > +
> > > +       old_val = READ_ONCE(rmap_head->val);
> > > +       if (!old_val)
> > > +               return 0;
> >
> > I'm having trouble understanding how this bit works. What exactly is
> > stopping the rmap from being populated while we have it "locked"?
>
> Nothing prevents the 0=>1 transition, but that's a-ok because walking rmaps for
> aging only cares about existing mappings.  The key to correctness is that this
> helper returns '0' when there are no rmaps, i.e. the caller is guaranteed to do
> nothing and thus will never see any rmaps that come along in the future.
>
> > aren't holding the MMU lock at all in the lockless case, and given
> > this bit, it is impossible (I think?) for the MMU-write-lock-holding,
> > rmap-modifying side to tell that this rmap is locked.
> >
> > Concretely, my immediate concern is that we will still unconditionally
> > write 0 back at unlock time even if the value has changed.
>
> The "readonly" unlocker (not added in this patch) is a nop if the rmap was empty,
> i.e. wasn't actually locked.

Ah, that's what I was missing. Thanks! This all makes sense now.

>
> +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head,
> +                                    unsigned long old_val)
> +{
> +       if (!old_val)
> +               return;
> +
> +       KVM_MMU_WARN_ON(old_val != (rmap_head->val & ~KVM_RMAP_LOCKED));
> +       WRITE_ONCE(rmap_head->val, old_val);
> +}
>
> The TODO in kvm_rmap_lock() pretty much sums things up: it's safe to call the
> "normal", non-readonly versions if and only if mmu_lock is held for write.
>
> +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head)
> +{
> +       /*
> +        * TODO: Plumb in @kvm and add a lockdep assertion that mmu_lock is
> +        *       held for write.
> +        */
> +       return __kvm_rmap_lock(rmap_head);
> +}