linux-kernel - Re: [PATCH 1/2] drm/gpuvm: add deferred vm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aL1pSFB9iBsfHFM_@google.com>
Date: Sun, 7 Sep 2025 11:15:20 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: Boris Brezillon <boris.brezillon@...labora.com>, Matthew Brost <matthew.brost@...el.com>, 
	"Thomas Hellström" <thomas.hellstrom@...ux.intel.com>, 
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>, 
	Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, 
	Steven Price <steven.price@....com>, Daniel Almeida <daniel.almeida@...labora.com>, 
	Liviu Dudau <liviu.dudau@....com>, dri-devel@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup

On Sat, Sep 06, 2025 at 12:47:36AM +0200, Danilo Krummrich wrote:
> On Fri Sep 5, 2025 at 8:18 PM CEST, Alice Ryhl wrote:
> > On Fri, Sep 5, 2025 at 3:25 PM Boris Brezillon
> > <boris.brezillon@...labora.com> wrote:
> >> On Fri, 05 Sep 2025 12:11:28 +0000
> >> Alice Ryhl <aliceryhl@...gle.com> wrote:
> >> > +static bool
> >> > +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> >> > +{
> >> > +     return !kref_read(&vm_bo->kref);
> >>
> >> I'm not too sure I like the idea of [ab]using vm_bo::kref to defer the
> >> vm_bo release. I get why it's done like that, but I'm wondering why we
> >> don't defer the release of drm_gpuva objects instead (which is really
> >> what's being released in va_unlink()). I can imagine drivers wanting to
> >> attach resources to the gpuva that can't be released in the
> >> dma-signalling path in the future, and if we're doing that at the gpuva
> >> level, we also get rid of this kref dance, since the va will hold a
> >> vm_bo ref until it's destroyed.
> >>
> >> Any particular reason you went for vm_bo destruction deferral instead
> >> of gpuva?
> >
> > All of the things that were unsafe to release in the signalling path
> > were tied to the vm_bo, so that is why I went for vm_bo cleanup.
> > Another advantage is that it lets us use the same deferred logic for
> > the vm_bo_put() call that drops the refcount from vm_bo_obtain().
> >
> > Of course if gpuvas might have resources that need deferred cleanup,
> > that might change the situation somewhat.
> 
> I think we want to track PT(E) allocations, or rather reference counts of page
> table structures carried by the drm_gpuva, but we don't need to release them on
> drm_gpuva_unlink(), which is where we drop the reference count of the vm_bo.
> 
> Deferring drm_gpuva_unlink() isn't really an option I think, the GEMs list of
> VM_BOs and the VM_BOs list of VAs is usually used in ttm_device_funcs::move to
> map or unmap all VAs associated with a GEM object.
> 
> I think PT(E) reference counts etc. should be rather released when the drm_gpuva
> is freed, i.e. page table allocations can be bound to the lifetime of a
> drm_gpuva. Given that, I think that eventually we'll need a cleanup list for
> those as well, since once they're removed from the VM tree (in the fence
> signalling critical path), we loose access otherwise.

Hmm. Another more conceptual issue with deferring gpuva is that
"immediate mode" is defined as having the GPUVM match the GPU's actual
address space at all times, which deferred gpuva cleanup would go
against.

Deferring vm_bo cleanup doesn't have this issue because even though the
vm_bo isn't kfreed immediately, all GPUVM apis still treat it as-if it
isn't there anymore.

> >> > +static void
> >> > +drm_gpuvm_bo_defer_locked(struct kref *kref)
> >> > +{
> >> > +     struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
> >> > +                                               kref);
> >> > +     struct drm_gpuvm *gpuvm = vm_bo->vm;
> >> > +
> >> > +     if (!drm_gpuvm_resv_protected(gpuvm)) {
> >> > +             drm_gpuvm_bo_list_del(vm_bo, extobj, true);
> >> > +             drm_gpuvm_bo_list_del(vm_bo, evict, true);
> >> > +     }
> >> > +
> >> > +     list_del(&vm_bo->list.entry.gem);
> >> > +     mutex_unlock(&vm_bo->obj->gpuva.lock);
> >>
> >> I got tricked by this implicit unlock, and the conditional unlocks it
> >> creates in drm_gpuva_unlink_defer(). Honestly, I'd rather see this
> >> unlocked moved to drm_gpuva_unlink_defer() and a conditional unlock
> >> added to drm_gpuvm_bo_put_deferred(), because it's easier to reason
> >> about when the lock/unlock calls are in the same function
> >> (kref_put_mutex() being the equivalent of a conditional lock).
> >
> > Ok. I followed the docs of kref_put_mutex() that say to unlock it from
> > the function.
> 
> Yes, please keep it the way it is, I don't want to deviate from what is
> documented and everyone else does. Besides that, I also think it's a little
> less error prone.

I gave it a try:

bool
drm_gpuvm_bo_put_deferred(struct drm_gpuvm_bo *vm_bo)
{
	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));

	if (!vm_bo)
		return false;

	if (kref_put_mutex(&vm_bo->kref, drm_gpuvm_bo_defer_locked,
			   &vm_bo->obj->gpuva.lock)) {
		/*
		 * It's important that the GEM stays alive for the duration in which
		 * drm_gpuvm_bo_defer_locked() holds the mutex, but the instant we add
		 * the vm_bo to bo_defer, another thread might call
		 * drm_gpuvm_bo_deferred_cleanup() and put the GEM. For this reason, we
		 * add the vm_bo to bo_defer *after* releasing the GEM's mutex.
		 */
		mutex_unlock(&vm_bo->obj->gpuva.lock);
		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);
		return true;
	}

	return false;
}

void
drm_gpuva_unlink_defer(struct drm_gpuva *va)
{
	struct drm_gem_object *obj = va->gem.obj;
	struct drm_gpuvm_bo *vm_bo = va->vm_bo;
	bool should_defer_bo;

	if (unlikely(!obj))
		return;

	drm_WARN_ON(vm_bo->vm->drm, !drm_gpuvm_immediate_mode(vm_bo->vm));

	mutex_lock(&obj->gpuva.lock);
	list_del_init(&va->gem.entry);

	/*
	 * This is drm_gpuvm_bo_put_deferred() slightly modified since we
	 * already hold the mutex. It's important that we add the vm_bo to
	 * bo_defer after releasing the mutex for the same reason as in
	 * drm_gpuvm_bo_put_deferred().
	 */
	should_defer_bo = kref_put(&vm_bo->kref, drm_gpuvm_bo_defer_locked);
	mutex_unlock(&obj->gpuva.lock);
	if (should_defer_bo)
		drm_gpuvm_bo_list_add(vm_bo, bo_defer, true);

	va->vm_bo = NULL;
}

I do think it looks relatively nice like this, particularly
drm_gpuva_unlink_defer(). But that's also the one not using
kref_put_mutex().

Alice