[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HURpnXgN0ux4sUk0nVze=A6d488i_ztiZTwGZUdDMoTvg@mail.gmail.com>
Date: Mon, 12 May 2025 15:02:00 -0700
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Peter Xu <peterx@...hat.com>, Yan Zhao <yan.y.zhao@...el.com>,
Maxim Levitsky <mlevitsk@...hat.com>
Subject: Re: [PATCH v2 3/5] KVM: Conditionally reschedule when resetting the
dirty ring
On Thu, May 8, 2025 at 7:11 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> When resetting a dirty ring, conditionally reschedule on each iteration
> after the first. The recently introduced hard limit mitigates the issue
> of an endless reset, but isn't sufficient to completely prevent RCU
> stalls, soft lockups, etc., nor is the hard limit intended to guard
> against such badness.
>
> Note! Take care to check for reschedule even in the "continue" paths,
> as a pathological scenario (or malicious userspace) could dirty the same
> gfn over and over, i.e. always hit the continue path.
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563
> rcu: (t=5250 jiffies g=-319 q=608 ncpus=24)
> CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G L 6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm]
> Call Trace:
> <TASK>
> kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
> kvm_dirty_ring_reset+0x58/0x220 [kvm]
> kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
> __x64_sys_ioctl+0x8b/0xb0
> do_syscall_64+0x5b/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm]
> Call Trace:
> <TASK>
> kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
> kvm_dirty_ring_reset+0x58/0x220 [kvm]
> kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
> __x64_sys_ioctl+0x8b/0xb0
> do_syscall_64+0x5b/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
>
> Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> virt/kvm/dirty_ring.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index e844e869e8c7..97cca0c02fd1 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -134,6 +134,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>
> ring->reset_index++;
> (*nr_entries_reset)++;
> +
> + /*
> + * While the size of each ring is fixed, it's possible for the
> + * ring to be constantly re-dirtied/harvested while the reset
> + * is in-progress (the hard limit exists only to guard against
> + * wrapping the count into negative space).
> + */
> + if (!first_round)
> + cond_resched();
Should we be dropping slots_lock here?
It seems like we need to be holding slots_lock to call
kvm_reset_dirty_gfn(), but that's it. Userspace can already change the
memslots after enabling the dirty ring, so `entry->slot` can already
be stale, so dropping slots_lock for the cond_resched() seems harmless
(and better than not dropping it).
Powered by blists - more mailing lists