linux-kernel - Re: [PATCH 11/15] KVM: x86/MMU: Refactor vmx_get_mt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZvloswO5g/o02V6@google.com>
Date:   Mon, 22 Nov 2021 18:46:58 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Ben Gardon <bgardon@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, Peter Xu <peterx@...hat.com>,
        Peter Shier <pshier@...gle.com>,
        David Matlack <dmatlack@...gle.com>,
        Mingwei Zhang <mizhang@...gle.com>,
        Yulei Zhang <yulei.kernel@...il.com>,
        Wanpeng Li <kernellwp@...il.com>,
        Xiao Guangrong <xiaoguangrong.eric@...il.com>,
        Kai Huang <kai.huang@...el.com>,
        Keqian Zhu <zhukeqian1@...wei.com>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 11/15] KVM: x86/MMU: Refactor vmx_get_mt_mask

On Mon, Nov 22, 2021, Ben Gardon wrote:
> On Fri, Nov 19, 2021 at 1:03 AM Paolo Bonzini <pbonzini@...hat.com> wrote:
> >
> > On 11/18/21 16:30, Sean Christopherson wrote:
> > > If we really want to make this state per-vCPU, KVM would need to incorporate the
> > > CR0.CD and MTRR settings in kvm_mmu_page_role.  For MTRRs in particular, the worst
> > > case scenario is that every vCPU has different MTRR settings, which means that
> > > kvm_mmu_page_role would need to be expanded by 10 bits in order to track every
> > > possible vcpu_idx (currently capped at 1024).
> >
> > Yes, that's insanity.  I was also a bit skeptical about Ben's try_get_mt_mask callback,
> > but this would be much much worse.
> 
> Yeah, the implementation of that felt a bit kludgy to me too, but
> refactoring the handling of all those CR bits was way more complex
> than I wanted to handle in this patch set.
> I'd love to see some of those CR0 / MTRR settings be set on a VM basis
> and enforced as uniform across vCPUs.

Architecturally, we can't do that.  Even a perfectly well-behaved guest will have
(small) periods where the BSP has different settings than APs.  And it's technically
legal to have non-uniform MTRR and CR0.CD/NW configurations, even though no modern
BIOS/kernel does that.  Except for non-coherent DMA, it's a moot point because KVM
can simply ignore guest cacheability settings.

> Looking up vCPU 0 and basing things on that feels extra hacky though,
> especially if we're still not asserting uniformity of settings across
> vCPUs.

IMO, it's marginally less hacky than what KVM has today as it allows KVM's behavior
to be clearly and sanely stated, e.g. KVM uses vCPU0's cacheability settings when
mapping non-coherent DMA.  Compare that with today's behavior where the cacheability
settings depend on which vCPU first faulted in the address for a given MMU role and
instance of the associated root, and whether other vCPUs share an MMU role/root.

> If we need to track that state to accurately virtualize the hardware
> though, that would be unfortunate.