linux-kernel - Re: [PATCH 11/15] KVM: x86/MMU: Refactor vmx_get_mt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZZxivgSeGH4wZnB@google.com>
Date:   Thu, 18 Nov 2021 15:30:18 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Ben Gardon <bgardon@...gle.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, Peter Xu <peterx@...hat.com>,
        Peter Shier <pshier@...gle.com>,
        David Matlack <dmatlack@...gle.com>,
        Mingwei Zhang <mizhang@...gle.com>,
        Yulei Zhang <yulei.kernel@...il.com>,
        Wanpeng Li <kernellwp@...il.com>,
        Xiao Guangrong <xiaoguangrong.eric@...il.com>,
        Kai Huang <kai.huang@...el.com>,
        Keqian Zhu <zhukeqian1@...wei.com>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 11/15] KVM: x86/MMU: Refactor vmx_get_mt_mask

On Thu, Nov 18, 2021, Paolo Bonzini wrote:
> On 11/16/21 00:45, Ben Gardon wrote:
> > Remove the gotos from vmx_get_mt_mask to make it easier to separate out
> > the parts which do not depend on vcpu state.
> > 
> > No functional change intended.
> > 
> > 
> > Signed-off-by: Ben Gardon <bgardon@...gle.com>
> 
> Queued, thanks (with a slightly edited commit message; the patch is a
> simplification anyway).

Don't know waht message you've queued, but just in case you kept some of the original,
can you further edit it to remove any snippets that mention separating out the parts
that don't depend on vCPU state?

IMO, we should not separate vmx_get_mt_mask() into per-VM and per-vCPU variants,
because the per-vCPU variant is a lie.  The memtype of a SPTE is not tracked anywhere,
which means that if the guest has non-uniform CR0.CD/NW or MTRR settings, KVM will
happily let the guest consumes SPTEs with the incorrect memtype.  In practice, this
isn't an issue because no sane BIOS or kernel uses per-CPU MTRRs, nor do they have
DMA operations running while the cacheability state is in flux.

If we really want to make this state per-vCPU, KVM would need to incorporate the
CR0.CD and MTRR settings in kvm_mmu_page_role.  For MTRRs in particular, the worst
case scenario is that every vCPU has different MTRR settings, which means that
kvm_mmu_page_role would need to be expanded by 10 bits in order to track every
possible vcpu_idx (currently capped at 1024).

So unless we want to massively complicate kvm_mmu_page_role and gfn_track for a
scenario no one cares about, I would strongly prefer to acknowledge that KVM assumes
memtypes are a per-VM property, e.g. on top:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 77f45c005f28..8a84d30f1dbd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6984,8 +6984,9 @@ static int __init vmx_check_processor_compat(void)
        return 0;
 }

-static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+static u64 vmx_get_mt_mask(struct kvm *kvm, gfn_t gfn, bool is_mmio)
 {
+       struct kvm_vcpu *vcpu;
        u8 cache;

        /* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
@@ -7009,11 +7010,15 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
        if (is_mmio)
                return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;

-       if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
+       if (!kvm_arch_has_noncoherent_dma(kvm))
                return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;

+       vcpu = kvm_get_vcpu_by_id(kvm, 0);
+       if (KVM_BUG_ON(!vcpu, kvm))
+               return;
+
        if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
-               if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+               if (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
                        cache = MTRR_TYPE_WRBACK;
                else
                        cache = MTRR_TYPE_UNCACHABLE;