[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HVpPOBkM4tD0xBpxWzYMiAoE5Gcg8Cg0=Xi5mvgeqR0gA@mail.gmail.com>
Date: Tue, 3 Sep 2024 14:40:36 -0700
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Oliver Upton <oliver.upton@...ux.dev>, Marc Zyngier <maz@...nel.org>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking
rmaps outside of mmu_lock
On Tue, Sep 3, 2024 at 2:27 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Tue, Sep 03, 2024, James Houghton wrote:
> > On Fri, Aug 9, 2024 at 12:44 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > +/*
> > > + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always
> > > + * operates with mmu_lock held for write), but rmaps can be walked without
> > > + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain
> > > + * being zapped/dropped _while the rmap is locked_.
> > > + *
> > > + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be
> > > + * done while holding mmu_lock for write. This allows a task walking rmaps
> > > + * without holding mmu_lock to concurrently walk the same entries as a task
> > > + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify
> > > + * the rmaps, thus the walks are stable.
> > > + *
> > > + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED,
> > > + * only the rmap chains themselves are protected. E.g. holding an rmap's lock
> > > + * ensures all "struct pte_list_desc" fields are stable.
> > > + */
> > > +#define KVM_RMAP_LOCKED BIT(1)
> > > +
> > > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head)
> > > +{
> > > + unsigned long old_val, new_val;
> > > +
> > > + old_val = READ_ONCE(rmap_head->val);
> > > + if (!old_val)
> > > + return 0;
> >
> > I'm having trouble understanding how this bit works. What exactly is
> > stopping the rmap from being populated while we have it "locked"?
>
> Nothing prevents the 0=>1 transition, but that's a-ok because walking rmaps for
> aging only cares about existing mappings. The key to correctness is that this
> helper returns '0' when there are no rmaps, i.e. the caller is guaranteed to do
> nothing and thus will never see any rmaps that come along in the future.
>
> > aren't holding the MMU lock at all in the lockless case, and given
> > this bit, it is impossible (I think?) for the MMU-write-lock-holding,
> > rmap-modifying side to tell that this rmap is locked.
> >
> > Concretely, my immediate concern is that we will still unconditionally
> > write 0 back at unlock time even if the value has changed.
>
> The "readonly" unlocker (not added in this patch) is a nop if the rmap was empty,
> i.e. wasn't actually locked.
Ah, that's what I was missing. Thanks! This all makes sense now.
>
> +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head,
> + unsigned long old_val)
> +{
> + if (!old_val)
> + return;
> +
> + KVM_MMU_WARN_ON(old_val != (rmap_head->val & ~KVM_RMAP_LOCKED));
> + WRITE_ONCE(rmap_head->val, old_val);
> +}
>
> The TODO in kvm_rmap_lock() pretty much sums things up: it's safe to call the
> "normal", non-readonly versions if and only if mmu_lock is held for write.
>
> +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head)
> +{
> + /*
> + * TODO: Plumb in @kvm and add a lockdep assertion that mmu_lock is
> + * held for write.
> + */
> + return __kvm_rmap_lock(rmap_head);
> +}
Powered by blists - more mailing lists