linux-kernel - Re: [PATCH v4 5/5] x86, kvm: support vcpu preempted check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <34aefe78-af86-19b3-9e2f-cb3ee6b5f735@linux.vnet.ibm.com>
Date:   Thu, 20 Oct 2016 02:45:33 +0800
From:   Pan Xinhui <xinhui@...ux.vnet.ibm.com>
To:     Radim Krčmář <rkrcmar@...hat.com>,
        Pan Xinhui <xinhui.pan@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        virtualization@...ts.linux-foundation.org,
        linux-s390@...r.kernel.org, xen-devel-request@...ts.xenproject.org,
        kvm@...r.kernel.org, benh@...nel.crashing.org, paulus@...ba.org,
        mpe@...erman.id.au, mingo@...hat.com, peterz@...radead.org,
        paulmck@...ux.vnet.ibm.com, will.deacon@....com,
        kernellwp@...il.com, jgross@...e.com, pbonzini@...hat.com,
        bsingharora@...il.com, boqun.feng@...il.com, borntraeger@...ibm.com
Subject: Re: [PATCH v4 5/5] x86, kvm: support vcpu preempted check


在 2016/10/20 01:24, Radim Krčmář 写道:
> 2016-10-19 06:20-0400, Pan Xinhui:
>> This is to fix some lock holder preemption issues. Some other locks
>> implementation do a spin loop before acquiring the lock itself.
>> Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
>> takes the cpu as parameter and return true if the cpu is preempted.  Then
>> kernel can break the spin loops upon on the retval of vcpu_is_preempted.
>>
>> As kernel has used this interface, So lets support it.
>>
>> We use one field of struct kvm_steal_time to indicate that if one vcpu
>> is running or not.
>>
>> unix benchmark result:
>> host:  kernel 4.8.1, i5-4570, 4 cpus
>> guest: kernel 4.8.1, 8 vcpus
>>
>> 	test-case			after-patch	  before-patch
>> Execl Throughput                       |    18307.9 lps  |    11701.6 lps
>> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
>> File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
>> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
>> Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps
>> Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps
>> Process Creation                       |    29881.2 lps  |    28572.8 lps
>> Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm
>> Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm
>> System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@...ux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> @@ -98,6 +98,10 @@ struct pv_time_ops {
>>  	unsigned long long (*steal_clock)(int cpu);
>>  };
>>
>> +struct pv_vcpu_ops {
>> +	bool (*vcpu_is_preempted)(int cpu);
>> +};
>> +
>
> (I would put it into pv_lock_ops to save the plumbing.)
>
hi, Radim
	thanks for your reply.

yes, a new struct leads patch into unnecessary lines changed.
I do that just because I am not sure which existing xxx_ops I should place the vcpu_is_preempted in.

>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -45,7 +45,8 @@ struct kvm_steal_time {
>>  	__u64 steal;
>>  	__u32 version;
>>  	__u32 flags;
>> -	__u32 pad[12];
>> +	__u32 preempted;
>
> Why __u32 instead of __u8?
>
I thought it is 32-bits aligned...
yes, u8 is good to store the preempt status.

>> +	__u32 pad[11];
>>  };
>
> Please document the change in Documentation/virtual/kvm/msr.txt, section
> MSR_KVM_STEAL_TIME.
>
okay, I totally forgot to do that. thanks!

>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> @@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
>> +static bool kvm_vcpu_is_preempted(int cpu)
>> +{
>> +	struct kvm_steal_time *src;
>> +
>> +	src = &per_cpu(steal_time, cpu);
>> +
>> +	return !!src->preempted;
>> +}
>> +
>>  #ifdef CONFIG_SMP
>>  static void __init kvm_smp_prepare_boot_cpu(void)
>>  {
>> @@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
>>  	kvm_guest_cpu_init();
>>  #endif
>>
>> +	pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
>
> Would be nicer to assign conditionally in the KVM_FEATURE_STEAL_TIME
> block.  The steal_time structure has to be zeroed, so this code would
> work, but the native function (return false) is better if we know that
> the kvm_vcpu_is_preempted() would always return false anway.
>
yes, agree. Will do that.

I once thought we can patch the code runtime.
we replace binary code
"call 0xXXXXXXXX #pv_vcpu_ops.vcpu_is_preempted"
with
"xor eax, eax"
however it is not worth doing that. the performace improvements might be very small.

> Old KVMs won't have the feature, so we could also assign only when KVM
> reports it, but that requires extra definitions and the performance gain
> is fairly small, so I'm ok with this.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>>  		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
>>  		return;
>>
>> +	vcpu->arch.st.steal.preempted = 0;
>> +
>>  	if (vcpu->arch.st.steal.version & 1)
>>  		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
>>
>> @@ -2812,6 +2814,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +	if (vcpu->arch.st.msr_val & KVM_MSR_ENABLED)
>> +		if (kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
>> +					&vcpu->arch.st.steal,
>> +					sizeof(struct kvm_steal_time)) == 0) {
>> +			vcpu->arch.st.steal.preempted = 1;
>> +			kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
>> +					&vcpu->arch.st.steal,
>> +					sizeof(struct kvm_steal_time));
>> +		}
>
> Please name this block of code.  Something like
>   kvm_steal_time_set_preempted(vcpu);
>
yep, my code style is ugly.
will do that.

thanks
xinhui


> Thanks.
>