[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4S65wQcApuITa7h@yzhao56-desk.sh.intel.com>
Date: Mon, 13 Jan 2025 15:04:07 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Peter Xu <peterx@...hat.com>, Maxim Levitsky
<mlevitsk@...hat.com>
Subject: Re: [PATCH 3/5] KVM: Conditionally reschedule when resetting the
dirty ring
On Fri, Jan 10, 2025 at 05:04:07PM -0800, Sean Christopherson wrote:
> When resetting a dirty ring, conditionally reschedule on each iteration
> after the first. The recently introduced hard limit mitigates the issue
> of an endless reset, but isn't sufficient to completely prevent RCU
> stalls, soft lockups, etc., nor is the hard limit intended to guard
> against such badness.
>
> Note! Take care to check for reschedule even in the "continue" paths,
> as a pathological scenario (or malicious userspace) could dirty the same
> gfn over and over, i.e. always hit the continue path.
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563
> rcu: (t=5250 jiffies g=-319 q=608 ncpus=24)
> CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G L 6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm]
> Call Trace:
> <TASK>
> kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
> kvm_dirty_ring_reset+0x58/0x220 [kvm]
> kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
> __x64_sys_ioctl+0x8b/0xb0
> do_syscall_64+0x5b/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm]
> Call Trace:
> <TASK>
> kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
> kvm_dirty_ring_reset+0x58/0x220 [kvm]
> kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
> __x64_sys_ioctl+0x8b/0xb0
> do_syscall_64+0x5b/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
>
> Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> virt/kvm/dirty_ring.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index a81ad17d5eef..37eb2b7142bd 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -133,6 +133,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>
> ring->reset_index++;
> (*nr_entries_reset)++;
> +
> + /*
> + * While the size of each ring is fixed, it's possible for the
> + * ring to be constantly re-dirtied/harvested while the reset
> + * is in-progress (the hard limit exists only to guard against
> + * wrapping the count into negative space).
> + */
> + if (!first_round)
> + cond_resched();
> +
Will cond_resched() per entry be too frequent?
Could we combine the cond_resched() per ring? e.g.
if (count >= ring->soft_limit)
cond_resched();
or simply
while (count < ring->size) {
...
}
> /*
> * Try to coalesce the reset operations when the guest is
> * scanning pages in the same slot.
> --
> 2.47.1.613.gc27f4b7a9f-goog
>
Powered by blists - more mailing lists