linux-kernel - Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r1ys7xpk.fsf@vitty.brq.redhat.com>
Date:   Tue, 18 Feb 2020 15:54:15 +0100
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Subject: Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots

Wanpeng Li <kernellwp@...il.com> writes:

> From: Wanpeng Li <wanpengli@...cent.com>
>
> In the progress of vCPUs creation, it queues a kvmclock sync worker to the global 
> workqueue before each vCPU creation completes. Each worker will be scheduled 
> after 300 * HZ delay and request a kvmclock update for all vCPUs and kick them 
> out. This is especially worse when scaling to large VMs due to a lot of vmexits. 
> Just one worker as a leader to trigger the kvmclock sync request for all vCPUs is 
> enough.
>
> Signed-off-by: Wanpeng Li <wanpengli@...cent.com>
> ---
> v3 -> v4:
>  * check vcpu->vcpu_idx
>
>  arch/x86/kvm/x86.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fb5d64e..d0ba2d4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9390,8 +9390,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>  	if (!kvmclock_periodic_sync)
>  		return;
>  
> -	schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> -					KVMCLOCK_SYNC_PERIOD);
> +	if (vcpu->vcpu_idx == 0)
> +		schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> +						KVMCLOCK_SYNC_PERIOD);
>  }
>  
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)

Forgive me my ignorance, I was under the impression
schedule_delayed_work() doesn't do anything if the work is already
queued (see queue_delayed_work_on()) and we seem to be scheduling the
same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not
per-vcpu). Do we actually happen to finish executing it before next vCPU
is created or why does the storm you describe happens?

-- 
Vitaly