lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250908122002.2c80dd3a@fedora>
Date: Mon, 8 Sep 2025 12:20:02 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: "Danilo Krummrich" <dakr@...nel.org>
Cc: "Alice Ryhl" <aliceryhl@...gle.com>, "Matthew Brost"
 <matthew.brost@...el.com>, Thomas Hellström
 <thomas.hellstrom@...ux.intel.com>, "Maarten Lankhorst"
 <maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
 "Thomas Zimmermann" <tzimmermann@...e.de>, "David Airlie"
 <airlied@...il.com>, "Simona Vetter" <simona@...ll.ch>, "Steven Price"
 <steven.price@....com>, "Daniel Almeida" <daniel.almeida@...labora.com>,
 "Liviu Dudau" <liviu.dudau@....com>, <dri-devel@...ts.freedesktop.org>,
 <linux-kernel@...r.kernel.org>, <rust-for-linux@...r.kernel.org>
Subject: Re: [PATCH 1/2] drm/gpuvm: add deferred vm_bo cleanup

On Mon, 08 Sep 2025 10:47:25 +0200
"Danilo Krummrich" <dakr@...nel.org> wrote:

> On Mon Sep 8, 2025 at 10:26 AM CEST, Alice Ryhl wrote:
> > On Mon, Sep 08, 2025 at 09:11:40AM +0200, Boris Brezillon wrote:  
> >> Hi Alice,
> >> 
> >> On Sun, 7 Sep 2025 11:39:41 +0000
> >> Alice Ryhl <aliceryhl@...gle.com> wrote:
> >>   
> >> > Yeah I guess we could have unlink remove the gpuva, but then allow the
> >> > end-user to attach the gpuva to a list of gpuvas to kfree deferred. That
> >> > way, the drm_gpuva_unlink() is not deferred but any resources it has can
> >> > be.  
> >> 
> >> This ^.
> >>   
> >> > 
> >> > Of course, this approach also makes deferred gpuva cleanup somewhat
> >> > orthogonal to this patch.  
> >> 
> >> Well, yes and no, because if you go for gpuva deferred cleanup, you
> >> don't really need the fancy kref_put() you have in this patch, it's
> >> just a regular vm_bo_put() that's called in the deferred gpuva path on
> >> the vm_bo attached to the gpuva being released.  
> >
> > Ok, so what you suggest is that on gpuva_unlink() we remove the gpuva
> > from the vm_bo's list, but then instead of putting the vm_bo's refcount,
> > we add the gpuva to a list, and in the deferred cleanup codepath we
> > iterate gpuvas and drop vm_bo refcounts *at that point*. Is that
> > understood correctly?  
> 
> It has to be a special unlink function though, since otherwise
> 
> 	drm_gpuva_link();
> 	drm_gpuva_unlink();
> 	drm_gpuva_link();
> 	drm_gpuva_unlink();
> 
> leaks the VM_BO.

I'm not following. Yes there's going to be a
drm_gpuva_unlink_defer_put() that skips the

        va->vm_bo = NULL;
        drm_gpuvm_bo_put(vm_bo);

and adds the gpuva to a list for deferred destruction. But I'm not
seeing where the leak is. Once the gpuva has been put in this list,
there should be no existing component referring to this object, and it's
going to be destroyed or recycled, but not re-used as-is.


> Sounds a bit messy, but my concern is really about the below:
> 
> > That means we don't immediately remove the vm_bo from the gem.gpuva
> > list, but the gpuva list in the vm_bo will be empty. I guess you already
> > have to handle such vm_bos anyway since you can already have an empty
> > vm_bo in between vm_bo_obtain() and the first call to gpuva_link().
> >
> > One disadvantage is that we might end up preparing or unevicting a GEM
> > object that doesn't have any VAs left, which the current approach
> > avoids.  
> 
> Yeah, we really want to avoid that.

I think I agree that we want to avoid it, but I'm not too sure about
the solution that involves playing with vm_bo::kref. I'm particularly
worried by the fact drivers can iterate the evict/extobj lists
directly, and can thus see objects scheduled for destruction. I know
there's a gpuvm_bo_is_dead() helper, and drivers should be aware of the
risks, but I don't feel comfortable about this.

And since we've mentioned the possibility of having to support
gpuva destruction deferral too, I'm wondering it wouldn't be cleaner
to just go for this approach from the start (gpuva owns a ref to a
vm_bo, which gets released when the gpuva object is released).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ