[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230509124428.GA217130@chaop.bj.intel.com>
Date: Tue, 9 May 2023 20:44:28 +0800
From: Chao Peng <chao.p.peng@...ux.intel.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: Sean Christopherson <seanjc@...gle.com>, david@...hat.com,
pbonzini@...hat.com, vkuznets@...hat.com, jmattson@...gle.com,
joro@...tes.org, mail@...iej.szmigiero.name, vbabka@...e.cz,
vannapurve@...gle.com, yu.c.zhang@...ux.intel.com,
kirill.shutemov@...ux.intel.com, dhildenb@...hat.com,
qperret@...gle.com, tabba@...gle.com, michael.roth@....com,
wei.w.wang@...el.com, rppt@...nel.org, liam.merwick@...cle.com,
isaku.yamahata@...il.com, jarkko@...nel.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, hughd@...gle.com, brauner@...nel.org
Subject: Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9]
KVM: mm: fd-based approach for supporting KVM)
On Fri, May 05, 2023 at 07:39:36PM +0000, Ackerley Tng wrote:
>
> Hi Sean,
>
> Thanks for implementing this POC!
>
> I’ve started porting the selftests (both Chao’s and those I added [1]).
Hi Sean/Ackerley,
Thanks for doing that. Overall making gmem a KVM ioctl() looks good to
me and it should also play nice with Intel TDX. Besides what Ackerley
mentioned below, I think we haven't discussed device assignment, which
will be supported in not too long distance. Current VFIO_IOMMU_MAP_DMA
consumes virtual address so that needs to be fixed for fd-based memory
anyway, and the fix looks not related to whether this being a syscall()
or a KVM ioctl(). There will be some initialization sequence dependency,
e.g. if gmem is finally a VM-scope ioctl() then we need VM created first
before can we map fd-based memory in VFIO, but that sounds not an issue
at all.
I also see Vlastimil/David expressed their preference on ioctl. So maybe
we can move forward on your current PoC. Do you already have a plan to
post a formal version?
Chao
>
> guest mem seems to cover the use cases that have been discussed and
> proposed so far, but I still need to figure out how gmem can work with
>
> + hugetlbfs
> + specification of/storing memory policy (for NUMA node bindings)
> + memory accounting - we may need to account for memory used separately,
> so that guest mem shows up separately on /proc/meminfo and similar
> places.
>
> One issue I’ve found so far is that the pointer to kvm (gmem->kvm) is
> not cleaned up, and hence it is possible to crash the host kernel in the
> following way
>
> 1. Create a KVM VM
> 2. Create a guest mem fd on that VM
> 3. Create a memslot with the guest mem fd (hence binding the fd to the
> VM)
> 4. Close/destroy the KVM VM
> 5. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm
> when it tries to do invalidation.
>
> I then tried to clean up the gmem->kvm pointer during unbinding when the
> KVM VM is destroyed.
>
> That works, but then I realized there’s a simpler way to use the pointer
> after freeing:
>
> 1. Create a KVM VM
> 2. Create a guest mem fd on that VM
> 3. Close/destroy the KVM VM
> 4. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm
> when it tries to do invalidation.
>
> Perhaps binding should mean setting the gmem->kvm pointer in addition to
> gmem->bindings. This makes binding and unbinding symmetric and avoids
> the use-after-frees described above.
>
> This also means that creating a guest mem fd is no longer dependent on
> the VM. Perhaps we can make creating a gmem fd a system ioctl (like
> KVM_GET_API_VERSION and KVM_CREATE_VM) instead of a vm ioctl?
>
> [1]
> https://lore.kernel.org/all/cover.1678926164.git.ackerleytng@google.com/T/
>
> Ackerley
Powered by blists - more mailing lists