lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aC3wFwFwFbLlsIft@x1.local>
Date: Wed, 21 May 2025 11:24:07 -0400
From: Peter Xu <peterx@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, Yan Zhao <yan.y.zhao@...el.com>,
	Maxim Levitsky <mlevitsk@...hat.com>,
	Binbin Wu <binbin.wu@...ux.intel.com>,
	James Houghton <jthoughton@...gle.com>,
	Pankaj Gupta <pankaj.gupta@....com>
Subject: Re: [PATCH v3 0/6]  KVM: Dirty ring fixes and cleanups

On Wed, May 21, 2025 at 07:50:10AM -0700, Sean Christopherson wrote:
> On Tue, May 20, 2025, Peter Xu wrote:
> > On Tue, May 20, 2025 at 04:16:00PM -0700, Sean Christopherson wrote:
> > > On Tue, May 20, 2025, Peter Xu wrote:
> > > > On Fri, May 16, 2025 at 02:35:34PM -0700, Sean Christopherson wrote:
> > > > > Sean Christopherson (6):
> > > > >   KVM: Bound the number of dirty ring entries in a single reset at
> > > > >     INT_MAX
> > > > >   KVM: Bail from the dirty ring reset flow if a signal is pending
> > > > >   KVM: Conditionally reschedule when resetting the dirty ring
> > > > >   KVM: Check for empty mask of harvested dirty ring entries in caller
> > > > >   KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
> > > > >     resets
> > > > >   KVM: Assert that slots_lock is held when resetting per-vCPU dirty
> > > > >     rings
> > > > 
> > > > For the last one, I'd think it's majorly because of the memslot accesses
> > > > (or CONFIG_LOCKDEP=y should yell already on resets?).  
> > > 
> > > No?  If KVM only needed to ensure stable memslot accesses, then SRCU would suffice.
> > > It sounds like holding slots_lock may have been a somewhat unintentional,  but the
> > > reason KVM can't switch to SRCU is that doing so would break ordering, not because
> > > slots_lock is needed to protect the memslot accesses.
> > 
> > Hmm.. isn't what you said exactly means a "yes"? :)
> > 
> > I mean, I would still expect lockdep to report this ioctl if without the
> > slots_lock, please correct me if it's not the case.
> 
> Yes, one of slots_lock or SRCU needs to be held.
> 
> > And if using RCU is not trivial (or not necessary either), so far the
> > slots_lock is still required to make sure the memslot accesses are legal?
> 
> I don't follow this part.  The intent of the comment is to document why slots_lock
> is required, which is exceptional because memslot access for readers are protected
> by kvm->srcu.

I always think it's fine to take slots_lock for readers too.  RCU can
definitely be better in most cases..

> The fact that slots_lock also protects memslots is notable only
> because it makes acquiring kvm->srcu superfluous.  But grabbing kvm->srcu is still
> safe/legal/ok:
> 
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 1ba02a06378c..6bf4f9e2f291 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -121,18 +121,26 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>         u64 cur_offset, next_offset;
>         unsigned long mask = 0;
>         struct kvm_dirty_gfn *entry;
> +       int idx;
>  
>         /*
>          * Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
>          * e.g. so that KVM fully resets all entries processed by a given call
> -        * before returning to userspace.  Holding slots_lock also protects
> -        * the various memslot accesses.
> +        * before returning to userspace.
>          */
>         lockdep_assert_held(&kvm->slots_lock);
>  
> +       /*
> +        * Holding slots_lock also protects the various memslot accesses, but
> +        * acquiring kvm->srcu for read here is still safe, just unnecessary.
> +        */
> +       idx = srcu_read_lock(&kvm->srcu);
> +
>         while (likely((*nr_entries_reset) < INT_MAX)) {
> -               if (signal_pending(current))
> +               if (signal_pending(current)) {
> +                       srcu_read_unlock(&kvm->srcu, idx);
>                         return -EINTR;
> +               }
>  
>                 entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>  
> @@ -205,6 +213,8 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>         if (mask)
>                 kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>  
> +       srcu_read_unlock(&kvm->srcu, idx);
> +
>         /*
>          * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
>          * by the VCPU thread next time when it enters the guest.
> --
> 
> And unless there are other behaviors that are protected by slots_lock (which is
> entirely possible), serializing the processing of each ring could be done via a

Yes, I am not the original author, but when I was working on it I don't
remember anything relying on that.  However still it's possible it can
serialize some operations under the hood (which will be true side effect of
using this lock..).

> dedicated (for example only, the dedicated mutex could/should be per-vCPU, not
> global).
> 
> This diff in particular shows why I ordered and phrased the comment the way I
> did.  The blurb about protecting memslot accesses is purely a friendly reminder
> to readers.  The sole reason for an assert and comment is to call out the need
> for ordering.
> 
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 1ba02a06378c..92ac82b535fe 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -102,6 +102,8 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
>         return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
>  }
>  
> +static DEFINE_MUTEX(per_ring_lock);
> +
>  int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>                          int *nr_entries_reset)
>  {
> @@ -121,18 +123,22 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>         u64 cur_offset, next_offset;
>         unsigned long mask = 0;
>         struct kvm_dirty_gfn *entry;
> +       int idx;
>  
>         /*
>          * Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
>          * e.g. so that KVM fully resets all entries processed by a given call
> -        * before returning to userspace.  Holding slots_lock also protects
> -        * the various memslot accesses.
> +        * before returning to userspace.
>          */
> -       lockdep_assert_held(&kvm->slots_lock);
> +       guard(mutex)(&per_ring_lock);
> +
> +       idx = srcu_read_lock(&kvm->srcu);
>  
>         while (likely((*nr_entries_reset) < INT_MAX)) {
> -               if (signal_pending(current))
> +               if (signal_pending(current)) {
> +                       srcu_read_unlock(&kvm->srcu, idx);
>                         return -EINTR;
> +               }
>  
>                 entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>  
> @@ -205,6 +211,8 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>         if (mask)
>                 kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>  
> +       srcu_read_unlock(&kvm->srcu, idx);
> +
>         /*
>          * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
>          * by the VCPU thread next time when it enters the guest.
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 571688507204..45729a6f6451 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4908,16 +4908,12 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
>         if (!kvm->dirty_ring_size)
>                 return -EINVAL;
>  
> -       mutex_lock(&kvm->slots_lock);
> -
>         kvm_for_each_vcpu(i, vcpu, kvm) {
>                 r = kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, &cleared);
>                 if (r)
>                         break;
>         }
>  
> -       mutex_unlock(&kvm->slots_lock);
> -
>         if (cleared)
>                 kvm_flush_remote_tlbs(kvm);
> --
> 

I think we almost agree on each other, and I don't see anything
controversial.

It's just that for this path using srcu may have slight risk of breaking
what used to be serialized as you said.  Said that, I'd be surprised if
so.. even if aarch64 is normally even trickier and it now also supports the
rings.  So it's just that it seems unnecessary yet to switch to srcu,
because we don't expect any concurrent writters anyway.

So totally no strong opinion on how the comment should be laid out in the
last patch - please feel free to ignore my request.  But I hope I stated
the fact, that in the current code base the slots_lock is required to
access memslots safely when rcu isn't around..

Thanks,

-- 
Peter Xu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ