linux-kernel - Re: [PATCH 4/4] rust: drm: add GPUVM immediate mode abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aS6lz12BIysBVHSV@google.com>
Date: Tue, 2 Dec 2025 08:39:43 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: Daniel Almeida <daniel.almeida@...labora.com>
Cc: Danilo Krummrich <dakr@...nel.org>, Matthew Brost <matthew.brost@...el.com>, 
	"Thomas Hellström" <thomas.hellstrom@...ux.intel.com>, 
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>, 
	Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, 
	Boris Brezillon <boris.brezillon@...labora.com>, Steven Price <steven.price@....com>, 
	Liviu Dudau <liviu.dudau@....com>, Miguel Ojeda <ojeda@...nel.org>, 
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, 
	"Björn Roy Baron" <bjorn3_gh@...tonmail.com>, Benno Lossin <lossin@...nel.org>, 
	Andreas Hindborg <a.hindborg@...nel.org>, Trevor Gross <tmgross@...ch.edu>, 
	Frank Binns <frank.binns@...tec.com>, Matt Coster <matt.coster@...tec.com>, 
	Rob Clark <robin.clark@....qualcomm.com>, Dmitry Baryshkov <lumag@...nel.org>, 
	Abhinav Kumar <abhinav.kumar@...ux.dev>, Jessica Zhang <jessica.zhang@....qualcomm.com>, 
	Sean Paul <sean@...rly.run>, Marijn Suijten <marijn.suijten@...ainline.org>, 
	Lyude Paul <lyude@...hat.com>, Lucas De Marchi <lucas.demarchi@...el.com>, 
	Rodrigo Vivi <rodrigo.vivi@...el.com>, Sumit Semwal <sumit.semwal@...aro.org>, 
	"Christian König" <christian.koenig@....com>, dri-devel@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org, 
	linux-arm-msm@...r.kernel.org, freedreno@...ts.freedesktop.org, 
	nouveau@...ts.freedesktop.org, intel-xe@...ts.freedesktop.org, 
	linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org, 
	Asahi Lina <lina+kernel@...hilina.net>
Subject: Re: [PATCH 4/4] rust: drm: add GPUVM immediate mode abstraction

On Mon, Dec 01, 2025 at 12:16:09PM -0300, Daniel Almeida wrote:
> Hi Alice,
> 
> I find it a bit weird that we reverted to v1, given that the previous gpuvm
> attempt was v3. No big deal though.
> 
> 
> > On 28 Nov 2025, at 11:14, Alice Ryhl <aliceryhl@...gle.com> wrote:
> > 
> > Add a GPUVM abstraction to be used by Rust GPU drivers.
> > 
> > GPUVM keeps track of a GPU's virtual address (VA) space and manages the
> > corresponding virtual mappings represented by "GPU VA" objects. It also
> > keeps track of the gem::Object<T> used to back the mappings through
> > GpuVmBo<T>.
> > 
> > This abstraction is only usable by drivers that wish to use GPUVM in
> > immediate mode. This allows us to build the locking scheme into the API
> > design. It means that the GEM mutex is used for the GEM gpuva list, and
> > that the resv lock is used for the extobj list. The evicted list is not
> > yet used in this version.
> > 
> > This abstraction provides a special handle called the GpuVmCore, which
> > is a wrapper around ARef<GpuVm> that provides access to the interval
> > tree. Generally, all changes to the address space requires mutable
> > access to this unique handle.
> > 
> > Some of the safety comments are still somewhat WIP, but I think the API
> > should be sound as-is.
> > 
> > Co-developed-by: Asahi Lina <lina+kernel@...hilina.net>
> > Signed-off-by: Asahi Lina <lina+kernel@...hilina.net>
> > Co-developed-by: Daniel Almeida <daniel.almeida@...labora.com>
> > Signed-off-by: Daniel Almeida <daniel.almeida@...labora.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@...gle.com>

> > +//! DRM GPUVM in immediate mode
> > +//!
> > +//! Rust abstractions for using GPUVM in immediate mode. This is when the GPUVM state is updated
> > +//! during `run_job()`, i.e., in the DMA fence signalling critical path, to ensure that the GPUVM
> 
> IMHO: We should initially target synchronous VM_BINDS, which are the opposite
> of what you described above.

Immediate mode is a locking scheme. We have to pick one of them
regardless of whether we do async VM_BIND yet.

(Well ok immediate mode is not just a locking scheme: it also determines
whether vm_bo cleanup is postponed or not.)

> > +/// A DRM GPU VA manager.
> > +///
> > +/// This object is refcounted, but the "core" is only accessible using a special unique handle. The
> 
> I wonder if `Owned<T>` is a good fit here? IIUC, Owned<T> can be refcounted,
> but there is only ever one handle on the Rust side? If so, this seems to be
> what we want here?

Yes, Owned<T> is probably a good fit.

> > +/// core consists of the `core` field and the GPUVM's interval tree.
> > +#[repr(C)]
> > +#[pin_data]
> > +pub struct GpuVm<T: DriverGpuVm> {
> > +    #[pin]
> > +    vm: Opaque<bindings::drm_gpuvm>,
> > +    /// Accessed only through the [`GpuVmCore`] reference.
> > +    core: UnsafeCell<T>,
> 
> This UnsafeCell has been here since Lina’s version. I must say I never
> understood why, and perhaps now is a good time to clarify it given the changes
> we’re making w.r.t to the “unique handle” thing.
> 
> This is just some driver private data. It’s never shared with C. I am not
> sure why we need this wrapper.

The sm_step_* methods receive a `&mut T`. This is UB if other code has
an `&GpuVm<T>` and the `T` is not wrapped in an `UnsafeCell` because
`&GpuVm<T>` implies that the data is not modified.

> > +    /// Shared data not protected by any lock.
> > +    #[pin]
> > +    shared_data: T::SharedData,
> 
> Should we deref to this?

We can do that.

> > +    /// Creates a GPUVM instance.
> > +    #[expect(clippy::new_ret_no_self)]
> > +    pub fn new<E>(
> > +        name: &'static CStr,
> > +        dev: &drm::Device<T::Driver>,
> > +        r_obj: &T::Object,
> 
> Can we call this “reservation_object”, or similar?
> 
> We should probably briefly explain what it does, perhaps linking to the C docs.

Yeah agreed, more docs are probably warranted here.

> I wonder if we should expose the methods below at this moment. We will not need
> them in Tyr until we start submitting jobs. This is still a bit in the future.
> 
> I say this for a few reasons:
> 
> a) Philipp is still working on the fence abstractions,
> 
> b) As a result from the above, we are taking raw fence pointers,
> 
> c) Onur is working on a WW Mutex abstraction [0] that includes a Rust
> implementation of drm_exec (under another name, and useful in other contexts
> outside of DRM). Should we use them here?
> 
> I think your current design with the ExecToken is also ok and perhaps we should
> stick to it, but it's good to at least discuss this with the others.

I don't think we can postpone adding the "obtain" method. It's required
to call sm_map, which is needed for VM_BIND.

> > +    /// Returns a [`GpuVmBoObtain`] for the provided GEM object.
> > +    #[inline]
> > +    pub fn obtain(
> > +        &self,
> > +        obj: &T::Object,
> > +        data: impl PinInit<T::VmBoData>,
> > +    ) -> Result<GpuVmBoObtain<T>, AllocError> {
> 
> Perhaps this should be called GpuVmBo? That’s what you want to “obtain” in the first place.
> 
> This is indeed a question, by the way.

One could possibly use Owned<_> here.

> > +/// A lock guard for the GPUVM's resv lock.
> > +///
> > +/// This guard provides access to the extobj and evicted lists.
> 
> Should we bother with evicted objects at this stage?

The abstractions don't actually support them right now. The resv lock is
currently only here because it's used internally in these abstractions.
It won't be useful to drivers until we add evicted objects.

> > +///
> > +/// # Invariants
> > +///
> > +/// Holds the GPUVM resv lock.
> > +pub struct GpuvmResvLockGuard<'a, T: DriverGpuVm>(&'a GpuVm<T>);
> > +
> > +impl<T: DriverGpuVm> GpuVm<T> {
> > +    /// Lock the VM's resv lock.
> 
> More docs here would be nice.
> 
> > +    #[inline]
> > +    pub fn resv_lock(&self) -> GpuvmResvLockGuard<'_, T> {
> > +        // SAFETY: It's always ok to lock the resv lock.
> > +        unsafe { bindings::dma_resv_lock(self.raw_resv_lock(), ptr::null_mut()) };
> > +        // INVARIANTS: We took the lock.
> > +        GpuvmResvLockGuard(self)
> > +    }
> 
> You can call this more than once and deadlock. Perhaps we should warn about this, or forbid it?

Same as any other lock. I don't think we need to do anything special.

> > +    /// Use the pre-allocated VA to carry out this map operation.
> > +    pub fn insert(self, va: GpuVaAlloc<T>, va_data: impl PinInit<T::VaData>) -> OpMapped<'op, T> {
> > +        let va = va.prepare(va_data);
> > +        // SAFETY: By the type invariants we may access the interval tree.
> > +        unsafe { bindings::drm_gpuva_map(self.vm_bo.gpuvm().as_raw(), va, self.op) };
> > +        // SAFETY: The GEM object is valid, so the mutex is properly initialized.
> 
> > +        unsafe { bindings::mutex_lock(&raw mut (*self.op.gem.obj).gpuva.lock) };
> 
> Should we use Fujita’s might_sleep() support here?

Could make sense yeah.

> > +/// ```
> > +/// struct drm_gpuva_op_unmap {
> > +/// /**
> > +/// * @va: the &drm_gpuva to unmap
> > +/// */
> > +/// struct drm_gpuva *va;
> > +///
> > +/// /**
> > +/// * @keep:
> > +/// *
> > +/// * Indicates whether this &drm_gpuva is physically contiguous with the
> > +/// * original mapping request.
> > +/// *
> > +/// * Optionally, if &keep is set, drivers may keep the actual page table
> > +/// * mappings for this &drm_gpuva, adding the missing page table entries
> > +/// * only and update the &drm_gpuvm accordingly.
> > +/// */
> > +/// bool keep;
> > +/// };
> > +/// ```
> 
> I think the docs could improve here ^

Yeah I can look at it.

> > +impl<T: DriverGpuVm> GpuVmCore<T> {
> > +    /// Create a mapping, removing or remapping anything that overlaps.
> > +    #[inline]
> > +    pub fn sm_map(&mut self, req: OpMapRequest<'_, T>) -> Result {
> 
> I wonder if we should keep this “sm” prefix. Perhaps
> “map_region” or “map_range” would be better names IMHO.

I'll wait for Danilo to weigh in on this. I'm not sure where "sm"
actually comes from.

> > +/// Represents that a given GEM object has at least one mapping on this [`GpuVm`] instance.
> > +///
> > +/// Does not assume that GEM lock is held.
> > +#[repr(C)]
> > +#[pin_data]
> > +pub struct GpuVmBo<T: DriverGpuVm> {
> 
> Oh, we already have GpuVmBo, and GpuVmBoObtain. I see.

Yeah, GpuVmBoObtain and GpuVmBoAlloc are pointers to GpuVmBo.

> > +    #[pin]
> > +    inner: Opaque<bindings::drm_gpuvm_bo>,
> > +    #[pin]
> > +    data: T::VmBoData,
> > +}
> > +
> > +impl<T: DriverGpuVm> GpuVmBo<T> {
> > +    pub(super) const ALLOC_FN: Option<unsafe extern "C" fn() -> *mut bindings::drm_gpuvm_bo> = {
> > +        use core::alloc::Layout;
> > +        let base = Layout::new::<bindings::drm_gpuvm_bo>();
> > +        let rust = Layout::new::<Self>();
> > +        assert!(base.size() <= rust.size());
> 
> We should default to something else instead of panicking IMHO.

This is const context, which makes it a build assertion.

> My overall opinion is that we’re adding a lot of things that will only be
> relevant when we’re more advanced on the job submission front. This
> includes the things that Phillip is working on (i.e.: Fences + JobQueue).
> 
> Perhaps we should keep this iteration downstream (so we’re sure it works
> when the time comes) and focus on synchronous VM_BINDS upstream.
> The Tyr demo that you’ve tested this on is very helpful for this purpose.

Yeah let's split out the prepare, GpuVmExec, and resv_add_fence stuff to
a separate patch.

I don't think sync vs async VM_BIND changes much in which methods or
structs are required here. Only difference is whether you call the
methods from a workqueue or not.

Alice