[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <88cbf12e-30ec-4b2c-a97e-b8d9e9aa8dda@amd.com>
Date: Mon, 27 Oct 2025 17:55:23 +0530
From: "Garg, Shivank" <shivankg@....com>
To: Vlastimil Babka <vbabka@...e.cz>, Sean Christopherson
<seanjc@...gle.com>, Miguel Ojeda <ojeda@...nel.org>,
Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
Paolo Bonzini <pbonzini@...hat.com>
Cc: linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Ackerley Tng <ackerleytng@...gle.com>, David Hildenbrand <david@...hat.com>,
Fuad Tabba <tabba@...gle.com>, Ashish Kalra <ashish.kalra@....com>
Subject: Re: [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode
cache
On 10/27/2025 4:36 PM, Vlastimil Babka wrote:
> On 10/16/25 19:28, Sean Christopherson wrote:
>> From: Shivank Garg <shivankg@....com>
>>
>> Add a dedicated gmem_inode structure and a slab-allocated inode cache for
>> guest memory backing, similar to how shmem handles inodes.
>>
>> This adds the necessary allocation/destruction functions and prepares
>> for upcoming guest_memfd NUMA policy support changes. Using a dedicated
>> structure will also allow for additional cleanups, e.g. to track flags in
>> gmem_inode instead of i_private.
>>
>> Signed-off-by: Shivank Garg <shivankg@....com>
>> Tested-by: Ashish Kalra <ashish.kalra@....com>
>> [sean: s/kvm_gmem_inode_info/gmem_inode, name init_once()]
>> Reviewed-by: Ackerley Tng <ackerleytng@...gle.com>
>> Tested-by: Ackerley Tng <ackerleytng@...gle.com>
>> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@...e.cz>
>
> Some nits below, not critical unless there's resubmit for other reasons:
Hi Vlastimil,
Thank you for the review.
>
>> @@ -860,13 +917,31 @@ static int kvm_gmem_init_mount(void)
>>
>> int kvm_gmem_init(struct module *module)
>> {
>> + struct kmem_cache_args args = {
>> + .align = 0,
>
> This seems unnecessary as it's implicit.
Ack
>> + .ctor = kvm_gmem_init_inode_once,
>> + };
>> + int ret;
>> +
>> kvm_gmem_fops.owner = module;
>> + kvm_gmem_inode_cachep = kmem_cache_create("kvm_gmem_inode_cache",
>> + sizeof(struct gmem_inode),
>> + &args, SLAB_ACCOUNT);
>> + if (!kvm_gmem_inode_cachep)
>> + return -ENOMEM;
>>
>> - return kvm_gmem_init_mount();
>> + ret = kvm_gmem_init_mount();
>> + if (ret) {
>> + kmem_cache_destroy(kvm_gmem_inode_cachep);
>> + return ret;
>> + }
>> + return 0;
>> }
>>
>> void kvm_gmem_exit(void)
>> {
>> kern_unmount(kvm_gmem_mnt);
>> kvm_gmem_mnt = NULL;
>> + rcu_barrier();
>
> Is it because VFS can do call_rcu() with something that ends up with
> kvm_gmem_free_inode()? Because nothing in this patch does that directly,
> maybe worth a comment?
Yes, exactly. I discovered this race condition while debugging a bug that
occurred during kvm_amd module unload after running gmem backed VM.
More details here:
https://lore.kernel.org/linux-mm/e7f7703d-fe76-4ab2-bef4-8d4c54da03ad@amd.com
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 427c0acee9d7..e1f69747fc84 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -969,7 +969,6 @@ static int kvm_gmem_init_mount(void)
int kvm_gmem_init(struct module *module)
{
struct kmem_cache_args args = {
- .align = 0,
.ctor = kvm_gmem_init_inode_once,
};
int ret;
@@ -993,6 +992,15 @@ void kvm_gmem_exit(void)
{
kern_unmount(kvm_gmem_mnt);
kvm_gmem_mnt = NULL;
+
+ /*
+ * Wait for all pending RCU callbacks to complete before destroying
+ * the inode cache. The VFS layer use call_rcu() during inode
+ * eviction (via evict_inodes() -> destroy_inode() -> call_rcu()),
+ * which eventually calls kvm_gmem_free_inode().
+ * We must ensure all such callbacks have finished before
+ * kmem_cache_destroy() to avoid issues with the kmem cache.
+ */
rcu_barrier();
kmem_cache_destroy(kvm_gmem_inode_cachep);
}
>
>> + kmem_cache_destroy(kvm_gmem_inode_cachep);
>> }
>
Powered by blists - more mailing lists