[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YdhugJ6h76JLHTjT@google.com>
Date: Fri, 7 Jan 2022 16:46:56 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: David Stevens <stevensd@...omium.org>
Cc: Marc Zyngier <maz@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
James Morse <james.morse@....com>,
Alexandru Elisei <alexandru.elisei@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Will Deacon <will@...nel.org>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Chia-I Wu <olv@...omium.org>
Subject: Re: [PATCH v5 4/4] KVM: mmu: remove over-aggressive warnings
On Fri, Jan 07, 2022, Sean Christopherson wrote:
> On Fri, Jan 07, 2022, David Stevens wrote:
> > > > These are the type of pages which KVM is currently rejecting. Is this
> > > > something that KVM can support?
> > >
> > > I'm not opposed to it. My complaint is that this series is incomplete in that it
> > > allows mapping the memory into the guest, but doesn't support accessing the memory
> > > from KVM itself. That means for things to work properly, KVM is relying on the
> > > guest to use the memory in a limited capacity, e.g. isn't using the memory as
> > > general purpose RAM. That's not problematic for your use case, because presumably
> > > the memory is used only by the vGPU, but as is KVM can't enforce that behavior in
> > > any way.
> > >
> > > The really gross part is that failures are not strictly punted to userspace;
> > > the resulting error varies significantly depending on how the guest "illegally"
> > > uses the memory.
> > >
> > > My first choice would be to get the amdgpu driver "fixed", but that's likely an
> > > unreasonable request since it sounds like the non-KVM behavior is working as intended.
> > >
> > > One thought would be to require userspace to opt-in to mapping this type of memory
> > > by introducing a new memslot flag that explicitly states that the memslot cannot
> > > be accessed directly by KVM, i.e. can only be mapped into the guest. That way,
> > > KVM has an explicit ABI with respect to how it handles this type of memory, even
> > > though the semantics of exactly what will happen if userspace/guest violates the
> > > ABI are not well-defined. And internally, KVM would also have a clear touchpoint
> > > where it deliberately allows mapping such memslots, as opposed to the more implicit
> > > behavior of bypassing ensure_pfn_ref().
> >
> > Is it well defined when KVM needs to directly access a memslot?
>
> Not really, there's certainly no established rule.
>
> > At least for x86, it looks like most of the use cases are related to nested
> > virtualization, except for the call in emulator_cmpxchg_emulated.
>
> The emulator_cmpxchg_emulated() will hopefully go away in the nearish future[*].
Forgot the link...
https://lore.kernel.org/all/YcG32Ytj0zUAW%2FB2@hirez.programming.kicks-ass.net/
> Paravirt features that communicate between guest and host via memory is the other
> case that often maps a pfn into KVM.
>
> > Without being able to specifically state what should be avoided, a flag like
> > that would be difficult for userspace to use.
>
> Yeah :-( I was thinking KVM could state the flag would be safe to use if and only
> if userspace could guarantee that the guest would use the memory for some "special"
> use case, but hadn't actually thought about how to word things.
>
> The best thing to do is probably to wait for for kvm_vcpu_map() to be eliminated,
> as described in the changelogs for commits:
>
> 357a18ad230f ("KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache")
> 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")
>
> Once that is done, everything in KVM will either access guest memory through the
> userspace hva, or via a mechanism that is tied into the mmu_notifier, at which
> point accessing non-refcounted struct pages is safe and just needs to worry about
> not corrupting _refcount.
Powered by blists - more mailing lists