linux-kernel - Re: [PATCH 1/5] KVM: Bound the number of dirty ring entries in a single reset at INT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4S5U1B2/+mLPU/f@yzhao56-desk.sh.intel.com>
Date: Mon, 13 Jan 2025 14:57:23 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
	<pbonzini@...hat.com>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	Peter Xu <peterx@...hat.com>, Maxim Levitsky <mlevitsk@...hat.com>
Subject: Re: [PATCH 1/5] KVM: Bound the number of dirty ring entries in a
 single reset at INT_MAX

On Mon, Jan 13, 2025 at 02:48:05PM +0800, Yan Zhao wrote:
> On Fri, Jan 10, 2025 at 05:04:05PM -0800, Sean Christopherson wrote:
> > Cap the number of ring entries that are reset in a single ioctl to INT_MAX
> > to ensure userspace isn't confused by a wrap into negative space, and so
> > that, in a truly pathological scenario, KVM doesn't miss a TLB flush due
> > to the count wrapping to zero.  While the size of the ring is fixed at
> > 0x10000 entries and KVM (currently) supports at most 4096, userspace is
> > allowed to harvest entries from the ring while the reset is in-progress,
> > i.e. it's possible for the ring to always harvested entries.
> > 
> > Opportunistically return an actual error code from the helper so that a
> > future fix to handle pending signals can gracefully return -EINTR.
> > 
> > Cc: Peter Xu <peterx@...hat.com>
> > Cc: Yan Zhao <yan.y.zhao@...el.com>
> > Cc: Maxim Levitsky <mlevitsk@...hat.com>
> > Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > ---
> >  include/linux/kvm_dirty_ring.h |  8 +++++---
> >  virt/kvm/dirty_ring.c          | 10 +++++-----
> >  virt/kvm/kvm_main.c            |  9 ++++++---
> >  3 files changed, 16 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> > index 4862c98d80d3..82829243029d 100644
> > --- a/include/linux/kvm_dirty_ring.h
> > +++ b/include/linux/kvm_dirty_ring.h
> > @@ -49,9 +49,10 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
> >  }
> >  
> >  static inline int kvm_dirty_ring_reset(struct kvm *kvm,
> > -				       struct kvm_dirty_ring *ring)
> > +				       struct kvm_dirty_ring *ring,
> > +				       int *nr_entries_reset)
> >  {
> > -	return 0;
> > +	return -ENOENT;
> >  }
> >  
> >  static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
> > @@ -81,7 +82,8 @@ int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
> >   * called with kvm->slots_lock held, returns the number of
> >   * processed pages.
> >   */
> > -int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
> > +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > +			 int *nr_entries_reset);
> >  
> >  /*
> >   * returns =0: successfully pushed
> > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> > index 7bc74969a819..2faf894dec5a 100644
> > --- a/virt/kvm/dirty_ring.c
> > +++ b/virt/kvm/dirty_ring.c
> > @@ -104,19 +104,19 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
> >  	return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
> >  }
> >  
> > -int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> > +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > +			 int *nr_entries_reset)
> >  {
> >  	u32 cur_slot, next_slot;
> >  	u64 cur_offset, next_offset;
> >  	unsigned long mask;
> > -	int count = 0;
> >  	struct kvm_dirty_gfn *entry;
> >  	bool first_round = true;
> >  
> >  	/* This is only needed to make compilers happy */
> >  	cur_slot = cur_offset = mask = 0;
> >  
> > -	while (true) {
> > +	while (likely((*nr_entries_reset) < INT_MAX)) {
> >  		entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
> >  
> >  		if (!kvm_dirty_gfn_harvested(entry))
> > @@ -129,7 +129,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> >  		kvm_dirty_gfn_set_invalid(entry);
> >  
> >  		ring->reset_index++;
> > -		count++;
> > +		(*nr_entries_reset)++;
> >  		/*
> >  		 * Try to coalesce the reset operations when the guest is
> >  		 * scanning pages in the same slot.
> > @@ -166,7 +166,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> >  
> >  	trace_kvm_dirty_ring_reset(ring);
> >  
> > -	return count;
> > +	return 0;
> >  }
> >  
> >  void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 9d54473d18e3..2d63b4d46ccb 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -4877,15 +4877,18 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
> >  {
> >  	unsigned long i;
> >  	struct kvm_vcpu *vcpu;
> > -	int cleared = 0;
> > +	int cleared = 0, r;
> >  
> >  	if (!kvm->dirty_ring_size)
> >  		return -EINVAL;
> >  
> >  	mutex_lock(&kvm->slots_lock);
> >  
> > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > -		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> Previously "cleared" counts all cleared count in all vCPUs.
> 
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		r = kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, &cleared);
> Here it's reset to the cleared count in the last vCPU, possibly causing loss of
> kvm_flush_remote_tlbs(().
Ah, sorry. it's not. (*nr_entries_reset)++ in each vCPU.

> 
> > +		if (r)
> > +			break;
> > +	}
> >  
> >  	mutex_unlock(&kvm->slots_lock);
> >  
> > -- 
> > 2.47.1.613.gc27f4b7a9f-goog
> > 
>