linux-kernel - Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170411204536.GA20384@potion>
Date:   Tue, 11 Apr 2017 22:45:36 +0200
From:   Radim Krčmář <rkrcmar@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     James Hogan <james.hogan@...tec.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, Christoffer Dall <cdall@...aro.org>,
        Andrew Jones <drjones@...hat.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Cornelia Huck <cornelia.huck@...ibm.com>,
        Paul Mackerras <paulus@...abs.org>
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in
 kvm_make_all_cpus_request()

2017-04-11 13:25+0800, Paolo Bonzini:
> On 07/04/2017 05:02, James Hogan wrote:
>> - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
>>   get two of these in quick succession only the first will wait for the
>>   IPI, which might work as long as they're already serialised but it
>>   still feels wrong).
> 
> But this is okay---avoiding multiple IPIs is the exact purpose of
> EXITING_GUEST_MODE.

I think this applied to the missed synchronization, in which case the
point is valid as the latter caller would assume that it can proceed to
reuse the memory even though the guest was still using it.

> There are evidently multiple uses of kvm_make_all_cpus_request, and we
> should avoid smp_call_function_many(..., true) if possible.  So perhaps:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a17d78759727..20e3bd60bdda 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -169,7 +169,7 @@ static void ack_flush(void *_completed)
>  {
>  }
>  
> -bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> +bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req, bool wait)
>  {
>  	int i, cpu, me;
>  	cpumask_var_t cpus;
> @@ -182,18 +182,19 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		kvm_make_request(req, vcpu);
>  		cpu = vcpu->cpu;
> +		if (cpus == NULL || cpu == -1 || cpu == me)
> +			continue;
>  
>  		/* Set ->requests bit before we read ->mode. */
>  		smp_mb__after_atomic();
> -
> -		if (cpus != NULL && cpu != -1 && cpu != me &&
> -		      kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
> +		if (kvm_arch_vcpu_should_kick(vcpu) ||
> +		    (wait && vcpu->mode != OUTSIDE_GUEST_MODE))
>  			cpumask_set_cpu(cpu, cpus);
>  	}
>  	if (unlikely(cpus == NULL))
> -		smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1);
> +		smp_call_function_many(cpu_online_mask, ack_flush, NULL, wait);
>  	else if (!cpumask_empty(cpus))
> -		smp_call_function_many(cpus, ack_flush, NULL, 1);
> +		smp_call_function_many(cpus, ack_flush, NULL, wait);
>  	else
>  		called = false;
>  	put_cpu();
> @@ -221,7 +222,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>  	 * kvm_make_all_cpus_request() reads vcpu->mode. We reuse that
>  	 * barrier here.
>  	 */
> -	if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> +	if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH, true))
>  		++kvm->stat.remote_tlb_flush;
>  	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>  }
> @@ -230,7 +231,7 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
>  
>  void kvm_reload_remote_mmus(struct kvm *kvm)
>  {
> -	kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> +	/* FIXME, is wait=true really needed?  */

Probably not.  There are two uses,

in kvm_mmu_prepare_zap_page():
  The only change that happens between kvm_reload_remote_mmus() and
  kvm_flush_remote_tlbs() in kvm_mmu_commit_zap_page() is setting of
  sp->role.invalid -- synchronizing it doesn't prevent any race with
  READING_SHADOW_PAGE_TABLES mode and the unconditional TLB flush is the
  important one.  I think that kvm_reload_remote_mmus doesn't even need
  to kick in this case.

in kvm_mmu_invalidate_zap_all_pages():
  Same situation: the guest cannot do an entry without increasing the
  generation number, but can enter READING_SHADOW_PAGE_TABLES mode
  between reload and flush.
  I think that we don't need to call 

but my knowledge of this area is obviously lacking ...

> +	kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, true);
>  }
>  
>  int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> 
> 
> Other users do not need wait=false.

You mean "wait=true"?

(Would be safer to assume they depend on the VM exit wait until proved
 otherwise ...)

> Or another idea is to embed wait in the request number, as suggested in the
> ARM thread, so that for example:

Right, I don't think that a TLB flush makes sense without
synchronization and adding context sensitivity 

> - bits 0-4 = bit number in vcpu->requests
> 
> - bit 8 = wait when making request

Sounds good.  The single-target kvm_make_request() + kvm_vcpu_kick()
should use this as well.

> - bit 9 = kick after making request

Maybe add bit mask to denote in which modes the kick/wait is necessary?

  bit 9  : IN_GUEST_MODE
  bit 10 : EXITING_GUEST_MODE
  bit 11 : READING_SHADOW_PAGE_TABLES

TLB_FLUSH would set bits 8-11.  IIUC, ARM has use for requests that need
to make sure that the guest is not in guest mode before proceeding and
those would set bit 8-10.

The common requests, "notice me as soon as possible", would set bit 9.
The bits 9-11 could also be used only when bit 8 is set, to make the
transition easier. (9 and 10 could be squished then as well.)