linux-kernel - Re: [PATCH] KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56DEEF69.20208@redhat.com>
Date:	Tue, 8 Mar 2016 16:27:37 +0100
From:	Paolo Bonzini <pbonzini@...hat.com>
To:	Lan Tianyu <tianyu.lan@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	gleb@...nel.org, mingo@...hat.com, hpa@...or.com, x86@...nel.org,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
	Xiao Guangrong <guangrong.xiao@...ux.intel.com>
Subject: Re: [PATCH] KVM: Remove redundant smp_mb() in the
 kvm_mmu_commit_zap_page()

On 08/03/2016 09:36, Lan Tianyu wrote:
> Summary about smp_mb()s we met in this thread. If misunderstood, please
> correct me. Thanks.
> 
> The smp_mb() in the kvm_flush_remote_tlbs() was introduced by the commit
> a4ee1ca4 and it seems to keep the order of reading and cmpxchg
> kvm->tlbs_dirty.
> 
>     Quote from Avi:
>     | I don't think we need to flush immediately; set a "tlb dirty" bit
> somewhere
>     | that is cleareded when we flush the tlb.
> kvm_mmu_notifier_invalidate_page()
>     | can consult the bit and force a flush if set.
> 
>     Signed-off-by: Xiao Guangrong <xiaoguangrong@...fujitsu.com>
>     Signed-off-by: Marcelo Tosatti <mtosatti@...hat.com>

Unfortunately that patch added a bad memory barrier: 1) it lacks a
comment; 2) it lacks obvious pairing; 3) it is an smp_mb() after a read,
so it's not even obvious that this memory barrier has to do with the
immediately preceding read of kvm->tlbs_dirty.  It also is not
documented in Documentation/virtual/kvm/mmu.txt (Guangrong documented
there most of his other work, back in 2013, but not this one :)).

The cmpxchg is ordered anyway against the read, because 1) x86 has
implicit ordering between earlier loads and later stores; 2) even
store-load barriers are unnecessary for accesses to the same variable
(in this case kvm->tlbs_dirty).

So offhand, I cannot say what it orders.  There are two possibilities:

1) it orders the read of tlbs_dirty with the read of mode.  In this
case, a smp_rmb() would have been enough, and it's not clear where is
the matching smp_wmb().

2) it orders the read of tlbs_dirty with the KVM_REQ_TLB_FLUSH request.
 In this case a smp_load_acquire would be better.

3) it does the same as kvm_mmu_commit_zap_page's smp_mb() but for other
callers of kvm_flush_remote_tlbs().  In this case, we know what's the
matching memory barrier (walk_shadow_page_lockless_*).

4) it is completely unnecessary.

My guess is either (2) or (3), but I am not sure.  We know that
anticipating kvm->tlbs_dirty should be safe; worst case, it causes the
cmpxchg to fail and an extra TLB flush on the next call to the MMU
notifier.  But I'm not sure of what happens if the processor moves the
read later.

> The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by commit
> c142786c6 which was merged later than commit a4ee1ca4. It pairs with
> smp_mb() in the walk_shadow_page_lockless_begin/end() to keep order
> between modifications of the page tables and reading mode.

Yes; it also pairs with the smp_mb__after_srcu_unlock() in vcpu_enter_guest.

> The smp_mb() in the kvm_make_all_cpus_request() was introduced by commit
> 6b7e2d09. It keeps order between setting request bit and reading mode.

Yes.

>>  So you can:
>>
>> 1) add a comment to kvm_flush_remote_tlbs like:
>>
>> 	/*
>> 	 * We want to publish modifications to the page tables before reading
>> 	 * mode.  Pairs with a memory barrier in arch-specific code.
>> 	 * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
>> 	 * - powerpc: smp_mb in kvmppc_prepare_to_enter.
>> 	 */
>>
>> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying
>> that the memory barrier also orders the write to mode from any reads
>> to the page tables done while the VCPU is running.  In other words, on
>> entry a single memory barrier achieves two purposes (write ->mode before
>> reading requests, write ->mode before reading page tables).
> 
> These sounds good.
> 
>> The same should be true in kvm_flush_remote_tlbs().  So you may investigate
>> removing the barrier from kvm_flush_remote_tlbs, because
>> kvm_make_all_cpus_request already includes a memory barrier.  Like
>> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
>> saying which memory barrier you are relying on and for what.
> 
> If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need to
> leave comments both in the kvm_flush_remote_tlbs() and
> kvm_mmu_commit_zap_page(), right?

Yes.  In fact, instead of removing it, I would change it to

	smp_mb__before_atomic();

with a comment that points to the addition of the barrier in commit
a4ee1ca4.  Unless Guangrong can enlighten us. :)

>>
>> And finally, the memory barrier in kvm_make_all_cpus_request can become
>> smp_mb__after_atomic, which is free on x86.
> 
> I found you have done this in your tree :)
> 
> commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479
> Author: Paolo Bonzini <pbonzini@...hat.com>
> Date:   Wed Feb 24 12:44:17 2016 +0100
> 
>     KVM: mark memory barrier with smp_mb__after_atomic
> 
>     Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>

Yeah, but I reverted it because it makes sense to do it together with
your patch.

Paolo