linux-kernel - Re: [PATCH v2 3/5] KVM: Conditionally reschedule when resetting the dirty ring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aCSns6Q5oTkdXUEe@google.com>
Date: Wed, 14 May 2025 07:24:51 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: James Houghton <jthoughton@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Peter Xu <peterx@...hat.com>, Yan Zhao <yan.y.zhao@...el.com>, 
	Maxim Levitsky <mlevitsk@...hat.com>
Subject: Re: [PATCH v2 3/5] KVM: Conditionally reschedule when resetting the
 dirty ring

On Tue, May 13, 2025, James Houghton wrote:
> On Tue, May 13, 2025 at 7:13 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > On Mon, May 12, 2025, James Houghton wrote:
> > > On Thu, May 8, 2025 at 7:11 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > ---
> > > >  virt/kvm/dirty_ring.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> > > > index e844e869e8c7..97cca0c02fd1 100644
> > > > --- a/virt/kvm/dirty_ring.c
> > > > +++ b/virt/kvm/dirty_ring.c
> > > > @@ -134,6 +134,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > > >
> > > >                 ring->reset_index++;
> > > >                 (*nr_entries_reset)++;
> > > > +
> > > > +               /*
> > > > +                * While the size of each ring is fixed, it's possible for the
> > > > +                * ring to be constantly re-dirtied/harvested while the reset
> > > > +                * is in-progress (the hard limit exists only to guard against
> > > > +                * wrapping the count into negative space).
> > > > +                */
> > > > +               if (!first_round)
> > > > +                       cond_resched();
> > >
> > > Should we be dropping slots_lock here?
> >
> > Could we?  Yes.  Should we?  Eh.  I don't see any value in doing so, because in
> > practice, it's extremely unlikely anything will be waiting on slots_lock.
> >
> > kvm_vm_ioctl_reset_dirty_pages() operates on all vCPUs, i.e. there won't be
> > competing calls to reset other rings.  A well-behaved userspace won't be modifying
> > memslots or dirty logs, and won't be toggling nx_huge_pages.
> >
> > That leaves kvm_vm_ioctl_set_mem_attributes(), kvm_inhibit_apic_access_page(),
> > kvm_assign_ioeventfd(), snp_launch_update(), and coalesced IO/MMIO (un)registration.
> > Except for snp_launch_update(), those are all brutally slow paths, e.g. require
> > SRCU synchronization and/or zapping of SPTEs.  And snp_launch_update() is probably
> > fairly slow too.
> 
> Okay, that makes sense.

As discussed offlist, dropping slots_lock would also be functionally problematic,
as concurrent calls to KVM_RESET_DIRTY_RINGS could get interwoven, which could
result in one of the calls returning to userspace without actually completing the
reset, i.e. if a different task has reaped entries but not yet called
kvm_reset_dirty_gfn().

> > And dropping slots_lock only makes any sense for non-preemptible kernels, because
> > preemptible kernels include an equivalent check in KVM_MMU_UNLOCK().
> 
> I'm not really sure what "equivalent check" you mean, sorry. For preemptible
> kernels, we could reschedule at any time, so dropping the slots_lock on a
> cond_resched() wouldn't do much, yeah. Hopefully that's partially what you
> mean.

Ya, that's essentially what I mean.  What I was referencing with KVM_MMU_UNLOCK()
is the explicit check for NEED_RESCHED that happens when the preempt count hits
'0' on preemptible kernels.

> > It's also possible that dropping slots_lock in this case could be a net negative.
> > I don't think it's likely, but I don't think it's any more or less likely that
> > droppings slots_lock is a net positive.  Without performance data to guide us,
> > it'd be little more than a guess, and I really, really don't want to set a
> > precedence of dropping a mutex on cond_resched() without a very strong reason
> > for doing so.
> 
> Fair enough.
> 
> Also, while we're at it, could you add a
> `lockdep_assert_held(&kvm->slots_lock)` to this function? :) Not necessarily
> in this patch.

Heh, yep, my mind jumped to that as well.  I'll tack on a patch to add a lockdep
assertion, along with a comment explaining what all it protects.