linux-kernel - Re: [PATCH] kvm: don't lose the higher 32 bits of tlbs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X9ee7RzW+Dhv1aoW@google.com>
Date:   Mon, 14 Dec 2020 09:20:45 -0800
From:   Sean Christopherson <seanjc@...gle.com>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH] kvm: don't lose the higher 32 bits of tlbs_dirty

On Sun, Dec 13, 2020, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@...ux.alibaba.com>
> 
> In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
> 	need_tlb_flush |= kvm->tlbs_dirty;
> with need_tlb_flush's type being int and tlbs_dirty's type being long.
> 
> It means that tlbs_dirty is always used as int and the higher 32 bits
> is useless. 

It's probably worth noting in the changelog that it's _extremely_ unlikely this
bug can cause problems in practice.  It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow paging or
be running a nested guest.

> We can just change need_tlb_flush's type to long to
> make full use of tlbs_dirty.

Hrm, this does solve the problem, but I'm not a fan of continuing to use an
integer variable as a boolean.  Rather than propagate tlbs_dirty to
need_tlb_flush, what if this bug fix patch checks tlbs_dirty directly, and then
a follow up patch converts need_tlb_flush to a bool and removes the unnecessary
initialization (see below).

E.g. the net result of both patches would be:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3abcb2ce5b7d..93b6986d3dfc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -473,7 +473,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
                                        const struct mmu_notifier_range *range)
 {
        struct kvm *kvm = mmu_notifier_to_kvm(mn);
-       int need_tlb_flush = 0, idx;
+       bool need_tlb_flush;
+       int idx;

        idx = srcu_read_lock(&kvm->srcu);
        spin_lock(&kvm->mmu_lock);
@@ -483,11 +484,10 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
         * count is also read inside the mmu_lock critical section.
         */
        kvm->mmu_notifier_count++;
-       need_tlb_flush = kvm_unmap_hva_range(kvm, range->start, range->end,
-                                            range->flags);
-       need_tlb_flush |= kvm->tlbs_dirty;
+       need_tlb_flush = !!kvm_unmap_hva_range(kvm, range->start, range->end,
+                                              range->flags);
        /* we've to flush the tlb before the pages can be freed */
-       if (need_tlb_flush)
+       if (need_tlb_flush || kvm->tlbs_dirty)
                kvm_flush_remote_tlbs(kvm);

        spin_unlock(&kvm->mmu_lock);

Cc: stable@...r.kernel.org
Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")

> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
> ---
>  virt/kvm/kvm_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2541a17ff1c4..4e519a517e9f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -470,7 +470,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  					const struct mmu_notifier_range *range)
>  {
>  	struct kvm *kvm = mmu_notifier_to_kvm(mn);
> -	int need_tlb_flush = 0, idx;
> +	long need_tlb_flush = 0;

need_tlb_flush doesn't need to be initialized here, it's explicitly set via the
call to kvm_unmap_hva_range().

> +	int idx;
>  
>  	idx = srcu_read_lock(&kvm->srcu);
>  	spin_lock(&kvm->mmu_lock);
> -- 
> 2.19.1.6.gb485710b
>