linux-kernel - Re: [PATCH] KVM: x86/mmu: Remove KVM MMU write lock when accessing indirect_shadow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANgfPd9kKxq1146F3mX_u7KCC0HrWfgYrxZd6c9Dh7s19E4Eog@mail.gmail.com>
Date:   Mon, 5 Jun 2023 10:17:27 -0700
From:   Ben Gardon <bgardon@...gle.com>
To:     Jim Mattson <jmattson@...gle.com>
Cc:     Mingwei Zhang <mizhang@...gle.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86/mmu: Remove KVM MMU write lock when accessing indirect_shadow_pages

On Mon, Jun 5, 2023 at 9:55 AM Jim Mattson <jmattson@...gle.com> wrote:
>
> On Sun, Jun 4, 2023 at 5:43 PM Mingwei Zhang <mizhang@...gle.com> wrote:
> >
> > Remove KVM MMU write lock when accessing indirect_shadow_pages counter when
> > page role is direct because this counter value is used as a coarse-grained
> > heuristics to check if there is nested guest active. Racing with this
> > heuristics without mmu lock will be harmless because the corresponding
> > indirect shadow sptes for the GPA will either be zapped by this thread or
> > some other thread who has previously zapped all indirect shadow pages and
> > makes the value to 0.
> >
> > Because of that, remove the KVM MMU write lock pair to potentially reduce
> > the lock contension and improve the performance of nested VM. In addition
> > opportunistically change the comment of 'direct mmu' to make the
> > description consistent with other places.
> >
> > Reported-by: Jim Mattson <jmattson@...gle.com>
> > Signed-off-by: Mingwei Zhang <mizhang@...gle.com>
> > ---
> >  arch/x86/kvm/x86.c | 10 ++--------
> >  1 file changed, 2 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5ad55ef71433..97cfa5a00ff2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8585,15 +8585,9 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >
> >         kvm_release_pfn_clean(pfn);
> >
> > -       /* The instructions are well-emulated on direct mmu. */
> > +       /* The instructions are well-emulated on Direct MMUs. */

Nit: Internally within Google, on older kernels, we have the "Direct
MMU" which was the precursor to the TDP MMU we all know and love. This
comment however does not refer to the Direct MMU. Direct here just
refers to the direct role bit being set. Since it's just descriptive,
direct should not be capitalized in this comment, so no reason to
change this line.

> >         if (vcpu->arch.mmu->root_role.direct) {
> > -               unsigned int indirect_shadow_pages;
> > -
> > -               write_lock(&vcpu->kvm->mmu_lock);
> > -               indirect_shadow_pages = vcpu->kvm->arch.indirect_shadow_pages;
> > -               write_unlock(&vcpu->kvm->mmu_lock);
> > -
> > -               if (indirect_shadow_pages)
> > +               if (READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
>
> I don't understand the need for READ_ONCE() here. That implies that
> there is something tricky going on, and I don't think that's the case.

Agree this doesn't need a READ_ONCE. Just a read is fine.
kvm_mmu_unprotect_page starts by acquiring the MMU lock, so there's
not much room to reorder anything anyway.

Thanks for sending a patch to fix this. The critical section of the
MMU lock here is small, but any lock acquisition in write mode can
mess up performance of otherwise happy read-mode uses.

>
> >                         kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> >
> >                 return true;
> >
> > base-commit: 31b4fc3bc64aadd660c5bfa5178c86a7ba61e0f7
> > --
> > 2.41.0.rc0.172.g3f132b7071-goog
> >