linux-kernel - Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130305095307.GA11728@hawk.usersys.redhat.com>
Date:	Tue, 5 Mar 2013 10:53:08 +0100
From:	Andrew Jones <drjones@...hat.com>
To:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Avi Kivity <avi.kivity@...il.com>,
	Gleb Natapov <gleb@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>,
	Jiannan Ouyang <ouyang@...pitt.edu>,
	Chegu Vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>
Subject: Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption
 notifiers

On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:
>  This patch series further filters better vcpu candidate to yield to
> in PLE handler. The main idea is to record the preempted vcpus using
> preempt notifiers and iterate only those preempted vcpus in the
> handler. Note that the vcpus which were in spinloop during pause loop
> exit are already filtered.

The %improvement and patch series look good.

> 
> Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
> precious suggestions during the discussion. 
> Thanks Srikar for suggesting to avoid rcu lock while checking task state
> that has improved overcommit cases.
> 
> There are basically two approches for the implementation.
> 
> Method 1: Uses per vcpu preempt flag (this series).
> 
> Method 2: We keep a bitmap of preempted vcpus. using this we can easily
> iterate over preempted vcpus.
> 
> Note that method 2 needs an extra index variable to identify/map bitmap to
> vcpu and it also needs static vcpu allocation.

We definitely don't want something that requires static vcpu allocation.
I think it'd be better to add another counter for the vcpu bit assignment.

> 
> I am also posting Method 2 approach for reference in case it interests.

I guess the interest in Method2 would come from perf numbers. Did you try
comparing Method1 vs. Method2?

> 
> Result: decent improvement for kernbench and ebizzy.
> 
> base = 3.8.0 + undercommit patches 
> patched = base + preempt patches
> 
> Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM
> 
> --+-----------+-----------+-----------+------------+-----------+
>                kernbench (exec time in sec lower is beter) 
> --+-----------+-----------+-----------+------------+-----------+
>       base       stdev       patched       stdev      %improve 
> --+-----------+-----------+-----------+------------+-----------+
> 1x    47.0383     4.6977     44.2584     1.2899	    5.90986
> 2x    96.0071     7.1873     91.2605     7.3567	    4.94401
> 3x   164.0157    10.3613    156.6750    11.4267	    4.47561
> 4x   212.5768    23.7326    204.4800    13.2908	    3.80888
> --+-----------+-----------+-----------+------------+-----------+
> no ple kernbench 1x result for reference: 46.056133
> 
> --+-----------+-----------+-----------+------------+-----------+
>                ebizzy (record/sec higher is better)
> --+-----------+-----------+-----------+------------+-----------+
>       base       stdev       patched       stdev      %improve 
> --+-----------+-----------+-----------+------------+-----------+
> 1x  5609.2000    56.9343    6263.7000    64.7097     11.66833
> 2x  2071.9000   108.4829    2653.5000   181.8395     28.07085
> 3x  1557.4167   109.7141    1993.5000   166.3176     28.00043
> 4x  1254.7500    91.2997    1765.5000   237.5410     40.70532
> --+-----------+-----------+-----------+------------+-----------+
> no ple ebizzy 1x result for reference : 7394.9 rec/sec
> 
> Please let me know if you have any suggestions and comments.
> 
> Raghavendra K T (2):
>    kvm: Record the preemption status of vcpus using preempt notifiers
>    kvm: Iterate over only vcpus that are preempted
> 	
> ----
>  include/linux/kvm_host.h | 1 +
>  virt/kvm/kvm_main.c      | 7 +++++++
>  2 files changed, 8 insertions(+)
>  
> Reference patch for Method 2
> ---8<---
> Use preempt bitmap and optimize vcpu iteration using preempt notifiers
> 
> From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
> 
> Record the preempted vcpus in a bit map using preempt notifiers.
> Add the logic of iterating over only preempted vcpus thus making
> vcpu iteration fast.
> Thanks Jiannan, Avi for initially proposing patch. Gleb, Peter for
> precious suggestions.
> Thanks srikar for suggesting to remove rcu lock while checking
> task state that helped in reducing overcommit overhead
> 
> Not-yet-signed-off-by: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
> ---
>  include/linux/kvm_host.h |    7 +++++++
>  virt/kvm/kvm_main.c      |   15 ++++++++++++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cad77fe..8c4a2409 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -252,6 +252,7 @@ struct kvm_vcpu {
>  		bool dy_eligible;
>  	} spin_loop;
>  #endif
> +	int idx;
>  	struct kvm_vcpu_arch arch;
>  };
>  
> @@ -385,6 +386,7 @@ struct kvm {
>  	long mmu_notifier_count;
>  #endif
>  	long tlbs_dirty;
> +	DECLARE_BITMAP(preempt_bitmap, KVM_MAX_VCPUS);
>  };
>  
>  #define kvm_err(fmt, ...) \
> @@ -413,6 +415,11 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
>  	     (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
>  	     idx++)
>  
> +#define kvm_for_each_preempted_vcpu(idx, vcpup, kvm, n) \
> +	for (idx = find_first_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS); \
> +	     idx < n && (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
> +	     idx = find_next_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS, idx+1))
> +
>  #define kvm_for_each_memslot(memslot, slots)	\
>  	for (memslot = &slots->memslots[0];	\
>  	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index adc68fe..1db16b3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1770,10 +1770,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  	struct kvm_vcpu *vcpu;
>  	int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>  	int yielded = 0;
> +	int num_vcpus;
>  	int try = 3;
>  	int pass;
>  	int i;
> -
> +
> +	num_vcpus = atomic_read(&kvm->online_vcpus);
>  	kvm_vcpu_set_in_spin_loop(me, true);
>  	/*
>  	 * We boost the priority of a VCPU that is runnable but not
> @@ -1783,7 +1785,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  	 * We approximate round-robin by starting at the last boosted VCPU.
>  	 */
>  	for (pass = 0; pass < 2 && !yielded && try; pass++) {
> -		kvm_for_each_vcpu(i, vcpu, kvm) {
> +		kvm_for_each_preempted_vcpu(i, vcpu, kvm, num_vcpus) {
>  			if (!pass && i <= last_boosted_vcpu) {
>  				i = last_boosted_vcpu;
>  				continue;
> @@ -1878,6 +1880,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
>  static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>  {
>  	int r;
> +	int curr_idx;
>  	struct kvm_vcpu *vcpu, *v;
>  
>  	vcpu = kvm_arch_vcpu_create(kvm, id);
> @@ -1916,7 +1919,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>  		goto unlock_vcpu_destroy;
>  	}
>  
> -	kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
> +	curr_idx = atomic_read(&kvm->online_vcpus);
> +	kvm->vcpus[curr_idx] = vcpu;
> +	vcpu->idx = curr_idx;
>  	smp_wmb();
>  	atomic_inc(&kvm->online_vcpus);
>  
> @@ -2902,6 +2907,7 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
>  static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
>  {
>  	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
> +	clear_bit(vcpu->idx, vcpu->kvm->preempt_bitmap);
>  
>  	kvm_arch_vcpu_load(vcpu, cpu);
>  }
> @@ -2911,6 +2917,9 @@ static void kvm_sched_out(struct preempt_notifier *pn,
>  {
>  	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>  
> +	if (current->state == TASK_RUNNING)
> +		set_bit(vcpu->idx, vcpu->kvm->preempt_bitmap);
> +
>  	kvm_arch_vcpu_put(vcpu);
>  }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/