linux-kernel - Re: [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YlR0a4PG5xzweeMZ@google.com>
Date:   Mon, 11 Apr 2022 18:33:15 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Mingwei Zhang <mizhang@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page
 workaround for nonpaging MMUs

On Mon, Apr 11, 2022, Mingwei Zhang wrote:
> On Sat, Apr 09, 2022, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 671cfeccf04e..89df062d5921 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -191,6 +191,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >  		.user = err & PFERR_USER_MASK,
> >  		.prefetch = prefetch,
> >  		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> > +
> > +		/*
> > +		 * Note, enforcing the NX huge page mitigation for nonpaging
> > +		 * MMUs (shadow paging, CR0.PG=0 in the guest) is completely
> > +		 * unnecessary.  The guest doesn't have any page tables to
> > +		 * abuse and is guaranteed to switch to a different MMU when
> > +		 * CR0.PG is toggled on (may not always be guaranteed when KVM
> > +		 * is using TDP).  See make_spte() for details.
> > +		 */
> >  		.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
> 
> hmm. I think there could be a minor issue here (even in original code).
> The nx_huge_page_workaround_enabled is attached here with page fault.
> However, at the time of make_spte(), we call is_nx_huge_page_enabled()
> again. Since this function will directly check the module parameter,
> there might be a race condition here. eg., at the time of page fault,
> the workround was 'true', while by the time we reach make_spte(), the
> parameter was set to 'false'.

Toggling the mitigation invalidates and zaps all roots.  Any page fault acquires
mmu_lock after the toggling is guaranteed to see the correct value, any page fault
that completed before kvm_mmu_zap_all_fast() is guaranteed to be zapped.

> I have not figured out what the side effect is. But I feel like the
> make_spte() should just follow the information in kvm_page_fault instead
> of directly querying the global config.

I started down this exact path :-)  The problem is that, even without Ben's series,
KVM uses make_spte() for things other than page faults.