linux-kernel - Re: [PATCH v2 2/2] rust: drm: Add GPUVM abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAj7gAzFVRX3dN7L@pollux>
Date: Wed, 23 Apr 2025 16:38:56 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Daniel Almeida <daniel.almeida@...labora.com>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
	Sumit Semwal <sumit.semwal@...aro.org>,
	Christian König <christian.koenig@....com>,
	Boris Brezillon <boris.brezillon@...labora.com>,
	Alyssa Rosenzweig <alyssa@...enzweig.io>,
	Lyude Paul <lyude@...hat.com>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	dri-devel@...ts.freedesktop.org, Asahi Lina <lina@...hilina.net>
Subject: Re: [PATCH v2 2/2] rust: drm: Add GPUVM abstraction

On Wed, Apr 23, 2025 at 10:29:01AM -0300, Daniel Almeida wrote:
> Hi Danilo,
> 
> FYI, most of this patch still retains the original code from the Asahi project,
> 
> > On 22 Apr 2025, at 18:16, Danilo Krummrich <dakr@...nel.org> wrote:
> > 
> > (Not a full review, but some things that took my attention from a brief look.)
> > 
> > On Tue, Apr 22, 2025 at 01:41:53PM -0300, Daniel Almeida wrote:
> 
> > 
> >> +        &mut self,
> >> +        gpuvm: &mut UpdatingGpuVm<'_, T>,
> >> +        gpuva: Pin<KBox<GpuVa<T>>>,
> >> +        gpuvmbo: &ARef<GpuVmBo<T>>,
> >> +    ) -> Result<(), Pin<KBox<GpuVa<T>>>> {
> >> +        // SAFETY: We are handing off the GpuVa ownership and it will not be moved.
> >> +        let p = KBox::leak(unsafe { Pin::into_inner_unchecked(gpuva) });
> >> +        // SAFETY: These C functions are called with the correct invariants
> >> +        unsafe {
> >> +            bindings::drm_gpuva_init_from_op(&mut p.gpuva, &mut self.0);
> >> +            if bindings::drm_gpuva_insert(gpuvm.0.gpuvm() as *mut _, &mut p.gpuva) != 0 {
> >> +                // EEXIST, return the GpuVa to the caller as an error
> >> +                return Err(Pin::new_unchecked(KBox::from_raw(p)));
> >> +            };
> > 
> > How do you ensure that the VM lock is held when there's a list of operations?
> 
> Are you saying that this will not work for async binds, or that it’s already broken in sync binds too?

I think here it's fine since sm_map() is implemented for LockedGpuVm, and that
seems to be the only place where you hand out a reference to an OpMap. But this
falls apart once we give out sets of operations, e.g. as list.

Even though, your patch doesn't do that yet, that's where we're heading to
eventually. Hence, the initial design must consider those things, otherwise
we only end up with something upstream that needs to be rewritten.

> 
> Perhaps we can get rid of this `UpdatingGpuVm` type and pass `LockedGpuVm` instead?

At a first glance I didn't get what's the difference between the two anyways.
But I don't think it's really related to the problem above, is it?

> 
> > 
> >> +            // SAFETY: This takes a new reference to the gpuvmbo.
> >> +            bindings::drm_gpuva_link(&mut p.gpuva, &gpuvmbo.bo as *const _ as *mut _);
> > 
> > How do you ensure that the dma_resv lock is held?
> 
> Right, I totally missed that.
> 
> > 
> >> +        }
> >> +        Ok(())
> >> +    }
> >> +}

<snip>

> >> +// SAFETY: DRM GpuVmBo objects are always reference counted and the get/put functions
> >> +// satisfy the requirements.
> >> +unsafe impl<T: DriverGpuVm> AlwaysRefCounted for GpuVmBo<T> {
> >> +    fn inc_ref(&self) {
> >> +        // SAFETY: The drm_gpuvm_get function satisfies the requirements for inc_ref().
> >> +        unsafe { bindings::drm_gpuvm_bo_get(&self.bo as *const _ as *mut _) };
> >> +    }
> >> +
> >> +    unsafe fn dec_ref(mut obj: NonNull<Self>) {
> >> +        // SAFETY: drm_gpuvm_bo_put() requires holding the gpuva lock, which is
> >> +        // the dma_resv lock by default.
> >> +        // The drm_gpuvm_put function satisfies the requirements for dec_ref().
> >> +        // (We do not support custom locks yet.)
> >> +        unsafe {
> >> +            let resv = (*obj.as_mut().bo.obj).resv;
> >> +            bindings::dma_resv_lock(resv, core::ptr::null_mut());
> >> +            bindings::drm_gpuvm_bo_put(&mut obj.as_mut().bo);
> >> +            bindings::dma_resv_unlock(resv);
> > 
> > What if the resv_lock is held already? Please also make sure to put multiple
> > unsafe calls each in a separate unsafe block.
> 
> By whom?

The lock might be held already by the driver or by TTM when things are called
from TTM callbacks.

This is why GPUVM never takes locks by itself, but asserts that the correct lock
is held.

I think we really want to get proof by the driver by providing lock guard
references.

> See my comment about “[..] a new type for GEM objects which are known to be locked"
> below.

<snip>

> >> +
> >> +    /// Obtains the [`GpuVmBo`] object that connects `obj` to this VM.
> >> +    ///
> >> +    /// This connection is unique, so an instane of [`GpuVmBo`] will be
> >> +    /// allocated for `obj` once, and that instance will be returned from that
> >> +    /// point forward.
> >> +    pub fn obtain_bo(&mut self, obj: &DriverObject<T>) -> Result<ARef<GpuVmBo<T>>> {
> >> +        // SAFETY: LockedGpuVm implies the right locks are held.
> > 
> > No, this must be locked by the dma-resv or the GEM gpuva lock, not by the
> > GPUVM lock that protects the interval tree.
> 
> By “GEM gpuva lock” you’re referring to the custom lock which we
> currently do not support, right?

Yes.

> This series currently rely on manual calls to dma_resv_{lock,unlock}, I wonder
> if we should ditch that in favor of something written in Rust directly. This
> would let us introduce a new type for GEM objects which are known to have
> `resv` locked. WDYT?

Not all functions that require the dma-resv lock to be held are called with a
GEM object parameter, it could also be a struct drm_gpuvm_bo, struct drm_gpuva
or struct drm_gpuvm, since they all carry GEM object pointers.

For reference, you can look for "_held" in drivers/gpu/drm/drm_gpuvm.c.