linux-kernel - Re: [EXTERNAL][PATCH 03/16] KVM: x86: set gfn-to-pfn cache length consistently with VM word size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b44f75f9d6c66f33cab85cbe463cc388d48ac7eb.camel@infradead.org>
Date:   Mon, 14 Nov 2022 09:36:14 -0800
From:   David Woodhouse <dwmw2@...radead.org>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "mhal@...x.co" <mhal@...x.co>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Durrant, Paul" <pdurrant@...zon.co.uk>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Kaya, Metin" <metikaya@...zon.co.uk>
Subject: Re: [EXTERNAL][PATCH 03/16] KVM: x86: set gfn-to-pfn cache length
 consistently with VM word size

On Mon, 2022-11-14 at 16:33 +0000, Sean Christopherson wrote:
> On Mon, Nov 14, 2022, Woodhouse, David wrote:
> > Most other data structures, including the pvclock info (both Xen and
> > native KVM), could potentially cross page boundaries. And isn't that
> > also true for things that we'd want to use the GPC for in nesting?
> 
> Off the top of my head, no.  Except for MSR and I/O permission bitmaps, which
> are >4KiB, things that are referenced by physical address are <=4KiB and must be
> naturally aligned.  nVMX does temporarily map L1's MSR bitmap, but that could be
> split into two separate mappings if necessary.
> 
> > For the runstate info I suggested reverting commit a795cd43c5b5 but
> > that doesn't actually work because it still has the same problem. Even
> > the gfn-to-hva cache still only really works for a single page, and
> > things like kvm_write_guest_offset_cached() will fall back to using
> > kvm_write_guest() in the case where it crosses a page boundary.
> > 
> > I'm wondering if the better fix is to allow the GPC to map more than
> > one page.
> 
> I agree that KVM should drop the "no page splits" restriction, but I don't think
> that would necessarily solve all KVM Xen issues.  KVM still needs to precisely
> handle the "correct" struct size, e.g. if one of the structs is placed at the very
> end of the page such that the smaller compat version doesn't split a page but the
> 64-bit version does.

I think we can be more explicit that the guest 'long' mode shall never
change while anything is mapped. Xen automatically detects that a guest
is in 64-bit mode very early on, either in the first 'fill the
hypercall page' MSR write, or when setting HVM_PARAM_CALLBACK_IRQ to
configure interrupt routing.

Strictly speaking, a guest could put itself into 32-bit mode and set
HVM_PARAM_CALLBACK_IRQ *again*. Xen would only update the wallclock
time in that case, and makes no attempt to convert anything else. I
don't think we need to replicate that.

On kexec/soft reset it could go back to 32-bit mode, but the soft reset
unmaps everything so that's OK.

I looked at making the GPC handle multiple pages but can't see how to
sanely do it for the IOMEM case. vmap() takes a list of *pages* not
PFNs, and memremap_pages() is... overly complex.

But if we can reduce it to *just* the runstate info that potentially
needs >1 page, then we can probably handle that with using two GPC (or
maybe GHC) caches for it. 

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)