linux-kernel - Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aC4taDzB45fUNQJr@google.com>
Date: Wed, 21 May 2025 12:45:44 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Peter Xu <peterx@...hat.com>, Maxim Levitsky <mlevitsk@...hat.com>, 
	Binbin Wu <binbin.wu@...ux.intel.com>, James Houghton <jthoughton@...gle.com>, 
	Pankaj Gupta <pankaj.gupta@....com>
Subject: Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to
 coalesce dirty ring resets

On Wed, May 21, 2025, Sean Christopherson wrote:
> On Wed, May 21, 2025, Yan Zhao wrote:
> > On Fri, May 16, 2025 at 02:35:39PM -0700, Sean Christopherson wrote:
> > > @@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > >  		ring->reset_index++;
> > >  		(*nr_entries_reset)++;
> > >  
> > > -		/*
> > > -		 * While the size of each ring is fixed, it's possible for the
> > > -		 * ring to be constantly re-dirtied/harvested while the reset
> > > -		 * is in-progress (the hard limit exists only to guard against
> > > -		 * wrapping the count into negative space).
> > > -		 */
> > > -		if (!first_round)
> > > +		if (mask) {
> > > +			/*
> > > +			 * While the size of each ring is fixed, it's possible
> > > +			 * for the ring to be constantly re-dirtied/harvested
> > > +			 * while the reset is in-progress (the hard limit exists
> > > +			 * only to guard against the count becoming negative).
> > > +			 */
> > >  			cond_resched();
> > >  
> > > -		/*
> > > -		 * Try to coalesce the reset operations when the guest is
> > > -		 * scanning pages in the same slot.
> > > -		 */
> > > -		if (!first_round && next_slot == cur_slot) {
> > > -			s64 delta = next_offset - cur_offset;
> > > +			/*
> > > +			 * Try to coalesce the reset operations when the guest
> > > +			 * is scanning pages in the same slot.
> > > +			 */
> > > +			if (next_slot == cur_slot) {
> > > +				s64 delta = next_offset - cur_offset;
> > >  
> > > -			if (delta >= 0 && delta < BITS_PER_LONG) {
> > > -				mask |= 1ull << delta;
> > > -				continue;
> > > -			}
> > > +				if (delta >= 0 && delta < BITS_PER_LONG) {
> > > +					mask |= 1ull << delta;
> > > +					continue;
> > > +				}
> > >  
> > > -			/* Backwards visit, careful about overflows!  */
> > > -			if (delta > -BITS_PER_LONG && delta < 0 &&
> > > -			    (mask << -delta >> -delta) == mask) {
> > > -				cur_offset = next_offset;
> > > -				mask = (mask << -delta) | 1;
> > > -				continue;
> > > +				/* Backwards visit, careful about overflows! */
> > > +				if (delta > -BITS_PER_LONG && delta < 0 &&
> > > +				(mask << -delta >> -delta) == mask) {
> > > +					cur_offset = next_offset;
> > > +					mask = (mask << -delta) | 1;
> > > +					continue;
> > > +				}
> > >  			}
> > > -		}
> > >  
> > > -		/*
> > > -		 * Reset the slot for all the harvested entries that have been
> > > -		 * gathered, but not yet fully processed.
> > > -		 */
> > > -		if (mask)
> > > +			/*
> > > +			 * Reset the slot for all the harvested entries that
> > > +			 * have been gathered, but not yet fully processed.
> > > +			 */
> > >  			kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > Nit and feel free to ignore it :)
> > 
> > Would it be better to move the "cond_resched()" to here, i.e., executing it for
> > at most every 64 entries?
> 
> Hmm, yeah, I think that makes sense.  The time spent manipulating the ring and
> mask+offset is quite trivial, so checking on every single entry is unnecessary.

Oh, no, scratch that.  Thankfully, past me explicitly documented this.  From
patch 3:

  Note!  Take care to check for reschedule even in the "continue" paths,
  as a pathological scenario (or malicious userspace) could dirty the same
  gfn over and over, i.e. always hit the continue path.

A batch isn't guaranteed to be flushed after processing 64 entries, it's only
flushed when an entry more than N gfns away is encountered.