linux-kernel - Re: [PATCH 4/4] rust: drm: add GPUVM immediate mode abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A15F2FAB-3D7C-4DF6-9399-DCFCF34C4D8F@collabora.com>
Date: Tue, 2 Dec 2025 10:42:23 -0300
From: Daniel Almeida <daniel.almeida@...labora.com>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: Danilo Krummrich <dakr@...nel.org>,
 Matthew Brost <matthew.brost@...el.com>,
 Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>,
 Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>,
 Boris Brezillon <boris.brezillon@...labora.com>,
 Steven Price <steven.price@....com>,
 Liviu Dudau <liviu.dudau@....com>,
 Miguel Ojeda <ojeda@...nel.org>,
 Boqun Feng <boqun.feng@...il.com>,
 Gary Guo <gary@...yguo.net>,
 Björn Roy Baron <bjorn3_gh@...tonmail.com>,
 Benno Lossin <lossin@...nel.org>,
 Andreas Hindborg <a.hindborg@...nel.org>,
 Trevor Gross <tmgross@...ch.edu>,
 Frank Binns <frank.binns@...tec.com>,
 Matt Coster <matt.coster@...tec.com>,
 Rob Clark <robin.clark@....qualcomm.com>,
 Dmitry Baryshkov <lumag@...nel.org>,
 Abhinav Kumar <abhinav.kumar@...ux.dev>,
 Jessica Zhang <jessica.zhang@....qualcomm.com>,
 Sean Paul <sean@...rly.run>,
 Marijn Suijten <marijn.suijten@...ainline.org>,
 Lyude Paul <lyude@...hat.com>,
 Lucas De Marchi <lucas.demarchi@...el.com>,
 Rodrigo Vivi <rodrigo.vivi@...el.com>,
 Sumit Semwal <sumit.semwal@...aro.org>,
 Christian König <christian.koenig@....com>,
 dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org,
 rust-for-linux@...r.kernel.org,
 linux-arm-msm@...r.kernel.org,
 freedreno@...ts.freedesktop.org,
 nouveau@...ts.freedesktop.org,
 intel-xe@...ts.freedesktop.org,
 linux-media@...r.kernel.org,
 linaro-mm-sig@...ts.linaro.org,
 Asahi Lina <lina+kernel@...hilina.net>
Subject: Re: [PATCH 4/4] rust: drm: add GPUVM immediate mode abstraction



> On 2 Dec 2025, at 05:39, Alice Ryhl <aliceryhl@...gle.com> wrote:
> 
> On Mon, Dec 01, 2025 at 12:16:09PM -0300, Daniel Almeida wrote:
>> Hi Alice,
>> 
>> I find it a bit weird that we reverted to v1, given that the previous gpuvm
>> attempt was v3. No big deal though.
>> 
>> 
>>> On 28 Nov 2025, at 11:14, Alice Ryhl <aliceryhl@...gle.com> wrote:
>>> 
>>> Add a GPUVM abstraction to be used by Rust GPU drivers.
>>> 
>>> GPUVM keeps track of a GPU's virtual address (VA) space and manages the
>>> corresponding virtual mappings represented by "GPU VA" objects. It also
>>> keeps track of the gem::Object<T> used to back the mappings through
>>> GpuVmBo<T>.
>>> 
>>> This abstraction is only usable by drivers that wish to use GPUVM in
>>> immediate mode. This allows us to build the locking scheme into the API
>>> design. It means that the GEM mutex is used for the GEM gpuva list, and
>>> that the resv lock is used for the extobj list. The evicted list is not
>>> yet used in this version.
>>> 
>>> This abstraction provides a special handle called the GpuVmCore, which
>>> is a wrapper around ARef<GpuVm> that provides access to the interval
>>> tree. Generally, all changes to the address space requires mutable
>>> access to this unique handle.
>>> 
>>> Some of the safety comments are still somewhat WIP, but I think the API
>>> should be sound as-is.
>>> 
>>> Co-developed-by: Asahi Lina <lina+kernel@...hilina.net>
>>> Signed-off-by: Asahi Lina <lina+kernel@...hilina.net>
>>> Co-developed-by: Daniel Almeida <daniel.almeida@...labora.com>
>>> Signed-off-by: Daniel Almeida <daniel.almeida@...labora.com>
>>> Signed-off-by: Alice Ryhl <aliceryhl@...gle.com>
> 
>>> +//! DRM GPUVM in immediate mode
>>> +//!
>>> +//! Rust abstractions for using GPUVM in immediate mode. This is when the GPUVM state is updated
>>> +//! during `run_job()`, i.e., in the DMA fence signalling critical path, to ensure that the GPUVM
>> 
>> IMHO: We should initially target synchronous VM_BINDS, which are the opposite
>> of what you described above.
> 
> Immediate mode is a locking scheme. We have to pick one of them
> regardless of whether we do async VM_BIND yet.
> 
> (Well ok immediate mode is not just a locking scheme: it also determines
> whether vm_bo cleanup is postponed or not.)
> 
>>> +/// A DRM GPU VA manager.
>>> +///
>>> +/// This object is refcounted, but the "core" is only accessible using a special unique handle. The
>> 
>> I wonder if `Owned<T>` is a good fit here? IIUC, Owned<T> can be refcounted,
>> but there is only ever one handle on the Rust side? If so, this seems to be
>> what we want here?
> 
> Yes, Owned<T> is probably a good fit.
> 
>>> +/// core consists of the `core` field and the GPUVM's interval tree.
>>> +#[repr(C)]
>>> +#[pin_data]
>>> +pub struct GpuVm<T: DriverGpuVm> {
>>> +    #[pin]
>>> +    vm: Opaque<bindings::drm_gpuvm>,
>>> +    /// Accessed only through the [`GpuVmCore`] reference.
>>> +    core: UnsafeCell<T>,
>> 
>> This UnsafeCell has been here since Lina’s version. I must say I never
>> understood why, and perhaps now is a good time to clarify it given the changes
>> we’re making w.r.t to the “unique handle” thing.
>> 
>> This is just some driver private data. It’s never shared with C. I am not
>> sure why we need this wrapper.
> 
> The sm_step_* methods receive a `&mut T`. This is UB if other code has
> an `&GpuVm<T>` and the `T` is not wrapped in an `UnsafeCell` because
> `&GpuVm<T>` implies that the data is not modified.
> 
>>> +    /// Shared data not protected by any lock.
>>> +    #[pin]
>>> +    shared_data: T::SharedData,
>> 
>> Should we deref to this?
> 
> We can do that.
> 
>>> +    /// Creates a GPUVM instance.
>>> +    #[expect(clippy::new_ret_no_self)]
>>> +    pub fn new<E>(
>>> +        name: &'static CStr,
>>> +        dev: &drm::Device<T::Driver>,
>>> +        r_obj: &T::Object,
>> 
>> Can we call this “reservation_object”, or similar?
>> 
>> We should probably briefly explain what it does, perhaps linking to the C docs.
> 
> Yeah agreed, more docs are probably warranted here.
> 
>> I wonder if we should expose the methods below at this moment. We will not need
>> them in Tyr until we start submitting jobs. This is still a bit in the future.
>> 
>> I say this for a few reasons:
>> 
>> a) Philipp is still working on the fence abstractions,
>> 
>> b) As a result from the above, we are taking raw fence pointers,
>> 
>> c) Onur is working on a WW Mutex abstraction [0] that includes a Rust
>> implementation of drm_exec (under another name, and useful in other contexts
>> outside of DRM). Should we use them here?
>> 
>> I think your current design with the ExecToken is also ok and perhaps we should
>> stick to it, but it's good to at least discuss this with the others.
> 
> I don't think we can postpone adding the "obtain" method. It's required
> to call sm_map, which is needed for VM_BIND.
> 
>>> +    /// Returns a [`GpuVmBoObtain`] for the provided GEM object.
>>> +    #[inline]
>>> +    pub fn obtain(
>>> +        &self,
>>> +        obj: &T::Object,
>>> +        data: impl PinInit<T::VmBoData>,
>>> +    ) -> Result<GpuVmBoObtain<T>, AllocError> {
>> 
>> Perhaps this should be called GpuVmBo? That’s what you want to “obtain” in the first place.
>> 
>> This is indeed a question, by the way.
> 
> One could possibly use Owned<_> here.
> 
>>> +/// A lock guard for the GPUVM's resv lock.
>>> +///
>>> +/// This guard provides access to the extobj and evicted lists.
>> 
>> Should we bother with evicted objects at this stage?
> 
> The abstractions don't actually support them right now. The resv lock is
> currently only here because it's used internally in these abstractions.
> It won't be useful to drivers until we add evicted objects.
> 
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// Holds the GPUVM resv lock.
>>> +pub struct GpuvmResvLockGuard<'a, T: DriverGpuVm>(&'a GpuVm<T>);
>>> +
>>> +impl<T: DriverGpuVm> GpuVm<T> {
>>> +    /// Lock the VM's resv lock.
>> 
>> More docs here would be nice.
>> 
>>> +    #[inline]
>>> +    pub fn resv_lock(&self) -> GpuvmResvLockGuard<'_, T> {
>>> +        // SAFETY: It's always ok to lock the resv lock.
>>> +        unsafe { bindings::dma_resv_lock(self.raw_resv_lock(), ptr::null_mut()) };
>>> +        // INVARIANTS: We took the lock.
>>> +        GpuvmResvLockGuard(self)
>>> +    }
>> 
>> You can call this more than once and deadlock. Perhaps we should warn about this, or forbid it?
> 
> Same as any other lock. I don't think we need to do anything special.
> 
>>> +    /// Use the pre-allocated VA to carry out this map operation.
>>> +    pub fn insert(self, va: GpuVaAlloc<T>, va_data: impl PinInit<T::VaData>) -> OpMapped<'op, T> {
>>> +        let va = va.prepare(va_data);
>>> +        // SAFETY: By the type invariants we may access the interval tree.
>>> +        unsafe { bindings::drm_gpuva_map(self.vm_bo.gpuvm().as_raw(), va, self.op) };
>>> +        // SAFETY: The GEM object is valid, so the mutex is properly initialized.
>> 
>>> +        unsafe { bindings::mutex_lock(&raw mut (*self.op.gem.obj).gpuva.lock) };
>> 
>> Should we use Fujita’s might_sleep() support here?
> 
> Could make sense yeah.
> 
>>> +/// ```
>>> +/// struct drm_gpuva_op_unmap {
>>> +/// /**
>>> +/// * @va: the &drm_gpuva to unmap
>>> +/// */
>>> +/// struct drm_gpuva *va;
>>> +///
>>> +/// /**
>>> +/// * @keep:
>>> +/// *
>>> +/// * Indicates whether this &drm_gpuva is physically contiguous with the
>>> +/// * original mapping request.
>>> +/// *
>>> +/// * Optionally, if &keep is set, drivers may keep the actual page table
>>> +/// * mappings for this &drm_gpuva, adding the missing page table entries
>>> +/// * only and update the &drm_gpuvm accordingly.
>>> +/// */
>>> +/// bool keep;
>>> +/// };
>>> +/// ```
>> 
>> I think the docs could improve here ^
> 
> Yeah I can look at it.
> 
>>> +impl<T: DriverGpuVm> GpuVmCore<T> {
>>> +    /// Create a mapping, removing or remapping anything that overlaps.
>>> +    #[inline]
>>> +    pub fn sm_map(&mut self, req: OpMapRequest<'_, T>) -> Result {
>> 
>> I wonder if we should keep this “sm” prefix. Perhaps
>> “map_region” or “map_range” would be better names IMHO.
> 
> I'll wait for Danilo to weigh in on this. I'm not sure where "sm"
> actually comes from.

sm probably is a reference to “split/merge”.

> 
>>> +/// Represents that a given GEM object has at least one mapping on this [`GpuVm`] instance.
>>> +///
>>> +/// Does not assume that GEM lock is held.
>>> +#[repr(C)]
>>> +#[pin_data]
>>> +pub struct GpuVmBo<T: DriverGpuVm> {
>> 
>> Oh, we already have GpuVmBo, and GpuVmBoObtain. I see.
> 
> Yeah, GpuVmBoObtain and GpuVmBoAlloc are pointers to GpuVmBo.
> 
>>> +    #[pin]
>>> +    inner: Opaque<bindings::drm_gpuvm_bo>,
>>> +    #[pin]
>>> +    data: T::VmBoData,
>>> +}
>>> +
>>> +impl<T: DriverGpuVm> GpuVmBo<T> {
>>> +    pub(super) const ALLOC_FN: Option<unsafe extern "C" fn() -> *mut bindings::drm_gpuvm_bo> = {
>>> +        use core::alloc::Layout;
>>> +        let base = Layout::new::<bindings::drm_gpuvm_bo>();
>>> +        let rust = Layout::new::<Self>();
>>> +        assert!(base.size() <= rust.size());
>> 
>> We should default to something else instead of panicking IMHO.
> 
> This is const context, which makes it a build assertion.
> 
>> My overall opinion is that we’re adding a lot of things that will only be
>> relevant when we’re more advanced on the job submission front. This
>> includes the things that Phillip is working on (i.e.: Fences + JobQueue).
>> 
>> Perhaps we should keep this iteration downstream (so we’re sure it works
>> when the time comes) and focus on synchronous VM_BINDS upstream.
>> The Tyr demo that you’ve tested this on is very helpful for this purpose.
> 
> Yeah let's split out the prepare, GpuVmExec, and resv_add_fence stuff to
> a separate patch.

Ack

> 
> I don't think sync vs async VM_BIND changes much in which methods or
> structs are required here. Only difference is whether you call the
> methods from a workqueue or not.
> 
> Alice