linux-kernel - Re: [PATCH] KVM: Disable wake-affine vCPU process to mitigate lock holder preemption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190730120939.GM31381@hirez.programming.kicks-ass.net>
Date:   Tue, 30 Jul 2019 14:09:39 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] KVM: Disable wake-affine vCPU process to mitigate lock
 holder preemption

On Tue, Jul 30, 2019 at 05:33:55PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@...cent.com>
> 
> Wake-affine is a feature inside scheduler which we attempt to make processes 
> running closely, it gains benefit mostly from cache-hit. When waker tries 
> to wakup wakee, it needs to select cpu to run wakee, wake affine heuristic 
> mays select the cpu which waker is running on currently instead of the prev 
> cpu which wakee was last time running. 
> 
> However, in multiple VMs over-subscribe virtualization scenario, it increases 
> the probability to incur vCPU stacking which means that the sibling vCPUs from 
> the same VM will be stacked on one pCPU. I test three 80 vCPUs VMs running on 
> one 80 pCPUs Skylake server(PLE is supported), the ebizzy score can increase 17% 
> after disabling wake-affine for vCPU process. 
> 
> When qemu/other vCPU inject virtual interrupt to guest through waking up one 
> sleeping vCPU, it increases the probability to stack vCPUs/qemu by scheduler
> wake-affine. vCPU stacking issue can greately inceases the lock synchronization 
> latency in a virtualized environment. This patch disables wake-affine vCPU 
> process to mitigtate lock holder preemption.
> 
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Radim Krčmář <rkrcmar@...hat.com>
> Signed-off-by: Wanpeng Li <wanpengli@...cent.com>
> ---
>  include/linux/sched.h | 1 +
>  kernel/sched/fair.c   | 3 +++
>  virt/kvm/kvm_main.c   | 1 +
>  3 files changed, 5 insertions(+)

> index 036be95..18eb1fa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5428,6 +5428,9 @@ static int wake_wide(struct task_struct *p)
>  	unsigned int slave = p->wakee_flips;
>  	int factor = this_cpu_read(sd_llc_size);
>  
> +	if (unlikely(p->flags & PF_NO_WAKE_AFFINE))
> +		return 1;
> +
>  	if (master < slave)
>  		swap(master, slave);
>  	if (slave < factor || master < slave * factor)

I intensely dislike how you misrepresent this patch as a KVM patch.

Also the above is very much not the right place, even if this PF_flag
were to live.