lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19e0beb6-a732-ea1f-79a5-41be92569338@redhat.com>
Date:   Tue, 30 Jul 2019 13:46:42 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Wanpeng Li <kernellwp@...il.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     Radim Krčmář <rkrcmar@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] KVM: Disable wake-affine vCPU process to mitigate lock
 holder preemption

On 30/07/19 11:33, Wanpeng Li wrote:
> When qemu/other vCPU inject virtual interrupt to guest through waking up one 
> sleeping vCPU, it increases the probability to stack vCPUs/qemu by scheduler
> wake-affine. vCPU stacking issue can greately inceases the lock synchronization 
> latency in a virtualized environment. This patch disables wake-affine vCPU 
> process to mitigtate lock holder preemption.

There is no guarantee that the vCPU remains on the thread where it's
created, so the patch is not enough.

If many vCPUs are stacked on the same pCPU, why doesn't the wake_cap
kick in sooner or later?

Paolo

> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Radim Krčmář <rkrcmar@...hat.com>
> Signed-off-by: Wanpeng Li <wanpengli@...cent.com>
> ---
>  include/linux/sched.h | 1 +
>  kernel/sched/fair.c   | 3 +++
>  virt/kvm/kvm_main.c   | 1 +
>  3 files changed, 5 insertions(+)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8dc1811..3dd33d8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1468,6 +1468,7 @@ extern struct pid *cad_pid;
>  #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
>  #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
>  #define PF_MEMALLOC_NOCMA	0x10000000	/* All allocation request will have _GFP_MOVABLE cleared */
> +#define PF_NO_WAKE_AFFINE	0x20000000	/* This thread should not be wake affine */
>  #define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
>  #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 036be95..18eb1fa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5428,6 +5428,9 @@ static int wake_wide(struct task_struct *p)
>  	unsigned int slave = p->wakee_flips;
>  	int factor = this_cpu_read(sd_llc_size);
>  
> +	if (unlikely(p->flags & PF_NO_WAKE_AFFINE))
> +		return 1;
> +
>  	if (master < slave)
>  		swap(master, slave);
>  	if (slave < factor || master < slave * factor)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 887f3b0..b9f75c3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2680,6 +2680,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>  
>  	mutex_unlock(&kvm->lock);
>  	kvm_arch_vcpu_postcreate(vcpu);
> +	current->flags |= PF_NO_WAKE_AFFINE;
>  	return r;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ