[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc4f01ec5172d29abd64429e3017cc53c0522e01.camel@mailbox.org>
Date: Tue, 25 Nov 2025 10:48:12 +0100
From: Philipp Stanner <phasta@...lbox.org>
To: Daniel Almeida <daniel.almeida@...labora.com>, Philipp Stanner
<phasta@...nel.org>
Cc: Alice Ryhl <aliceryhl@...gle.com>, Danilo Krummrich <dakr@...nel.org>,
Christian König <ckoenig.leichtzumerken@...il.com>,
Tvrtko Ursulin <tursulin@...ulin.net>, Alexandre Courbot
<acourbot@...dia.com>, Boris Brezillon <boris.brezillon@...labora.com>,
Dave Airlie <airlied@...hat.com>, Lyude Paul <lyude@...hat.com>, Peter
Colberg <pcolberg@...hat.com>, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: [RFC WIP 2/3] rust: sync: Add dma_fence abstractions
On Mon, 2025-11-24 at 09:49 -0300, Daniel Almeida wrote:
> Hi Phillipp,
>
> >
[…]
> > +use kernel::sync::{Arc, ArcBorrow};
> > +use kernel::c_str;
> > +
> > +/// Defines the callback function the dma-fence backend will call once the fence gets signalled.
> > +pub trait DmaFenceCbFunc {
> > + /// The callback function. `cb` is a container of the data which the driver passed in
> > + /// [`DmaFence::register_callback`].
> > + fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
> > + where
> > + Self: Sized;
> > +}
> > +
> > +/// Container for driver data which the driver gets back in its callback once the fence gets
> > +/// signalled.
> > +#[pin_data]
> > +pub struct DmaFenceCb<T: DmaFenceCbFunc> {
> > + /// C struct needed for the backend.
> > + #[pin]
> > + inner: Opaque<bindings::dma_fence_cb>,
> > + /// Driver data.
> > + #[pin]
> > + pub data: T,
>
> We should probably deref to this type.
As a transparent container you mean. I can put it on the todo-list.
>
> > +}
> > +
> > +impl<T: DmaFenceCbFunc + 'static> DmaFenceCb<T> {
> >
[…]
> > +impl ArcBorrow<'_, DmaFenceCtx> {
> > + /// Create a new fence, consuming `data`.
> > + ///
> > + /// The fence will increment the refcount of the fence context associated with this
> > + /// [`DmaFenceCtx`].
> > + pub fn new_fence<T>(
> > + &mut self,
> > + data: impl PinInit<T>,
> > + ) -> Result<ARef<DmaFence<T>>> {
> > + let fctx: Arc<DmaFenceCtx> = (*self).into();
> > + let seqno: u64 = fctx.get_new_fence_seqno();
> > +
> > + // TODO: Should we reset seqno in case of failure?
>
> I think we should go back to the old value, yeah.
It would be trivial to implement that (just atomic.decrement()).
The thing why the TODO even exists is that I'm a bit unsure about
races. It seems we have to choose between either a gap in the seqnos or
the possiblity of seqnos being out of order.
If the user / driver creates fences with >1 thread on a fence context,
I mean.
We're pretty free in our choices, however. The shared fence-fctx
spinlock will be removed anyways, so one could later easily replace the
fctx atomic with a lock if that's desirable.
I can implement a seqno-decrement for now.
>
> > + // Pass `fctx` by value so that the fence will hold a reference to the DmaFenceCtx as long
> > + // as it lives.
> > + DmaFence::new(fctx, data, &self.lock, self.nr, seqno)
> > + }
> > +}
> > +
> >
[…]
> > +
> > +impl<T> DmaFence<T> {
> > + // TODO: There could be a subtle potential problem here? The LLVM compiler backend can create
> > + // several versions of this constant. Their content would be identical, but their addresses
> > + // different.
> > + const OPS: bindings::dma_fence_ops = Self::ops_create();
> > +
> > + /// Create an initializer for a new [`DmaFence`].
> > + fn new(
> > + fctx: Arc<DmaFenceCtx>,
> > + data: impl PinInit<T>, // TODO: The driver data should implement PinInit<T, Error>
> > + lock: &Opaque<bindings::spinlock_t>,
>
> I wonder if we should take a Rust lock directly?
Yes, good idea; Boqun has suggested that in the first dma_fence RFC,
too. It will be as simple as SpinLock<()>.
Will do in dma_fence RFC v2 and rebase the Jobqueue on it.
>
> > + context: u64,
> > + seqno: u64,
> > + ) -> Result<ARef<Self>> {
> > + let fence = pin_init!(Self {
> > + inner <- Opaque::ffi_init(|slot: *mut bindings::dma_fence| {
> > + let lock_ptr = &raw const (*lock);
> > + // SAFETY: `slot` is a valid pointer to an uninitialized `struct dma_fence`.
> > + // `lock_ptr` is a pointer to the spinlock of the fence context, which is shared
>
> Then we should perhaps extract that lock from the fence context itself, instead
> of passing it as an argument? This relationship is not enforced in the current
> code.
See the series linked in the cover letter. Soon fences will all have
their own lock, and the fctx will either just be the atomic seqno or
have a separate lock.
>
> > + // among all the fences. This can't become a UAF because each fence takes a
> > + // reference of the fence context.
> > + unsafe { bindings::dma_fence_init(slot, &Self::OPS, Opaque::cast_into(lock_ptr), context, seqno) };
> > + }),
> > + data <- data,
> > + signalling: false,
> > + signalling_cookie: false,
> > + fctx: fctx,
> > + });
> > +
> > + let b = KBox::pin_init(fence, GFP_KERNEL)?;
> > +
> > + // SAFETY: We don't move the contents of `b` anywhere here. After unwrapping it, ARef will
> > + // take care of preventing memory moves.
> > + let rawptr = KBox::into_raw(unsafe { Pin::into_inner_unchecked(b) });
> > +
> > + // SAFETY: `rawptr` was created validly above.
> > + let aref = unsafe { ARef::from_raw(NonNull::new_unchecked(rawptr)) };
> > +
> > + Ok(aref)
> > + }
> > +
> > + /// Mark the beginning of a DmaFence signalling critical section. Should be called once a fence
> > + /// gets published.
> > + ///
> > + /// The signalling critical section is marked as finished automatically once the fence signals.
> > + pub fn begin_signalling(&mut self) {
> > + // FIXME: this needs to be mutable, obviously, but we can't borrow mutably. *sigh*
>
> Is AtomicBool going away? Otherwise can you expand?
The AtomicBool is just used in the example demo code.
The issue here is that begin_signalling() should set a "this fence is
currently in the signalling section"-flag. So the fence needs to be
mutable. Then, however, Rust complains because self.signalling is not
protected by any lock.
So one needs some sort of synchronization. Stuffing a DmaFence into a
SpinLock would be overkill, however, considering that the C code
already takes care of properly taking all locks.
I've asked about that problem on Zulip once:
https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20ARef.20without.20locking/near/539747635
Haven't looked deeper into solving it since, because those lockdep
guards are kind of nice-to-have at the moment.
I think the solution will be to make self.signalling an AtomicBool (I
think you meant that above?)
>
> > + self.signalling = true;
> > + // TODO: Should we warn if a user tries to do this several times for the same
> > + // fence? And should we ignore the request if the fence is already signalled?
> > +
> > + // SAFETY: `dma_fence_begin_signalling()` works on global lockdep data and calling it is
> > + // always safe.
> > + self.signalling_cookie = unsafe { bindings::dma_fence_begin_signalling() };
> > + }
> > +
> >
> > +
> > + /// Signal the fence. This will invoke all registered callbacks.
> > + pub fn signal(&self) -> Result {
> > + // SAFETY: `self` is refcounted.
> > + let ret = unsafe { bindings::dma_fence_signal(self.as_raw()) };
> > + if ret != 0 {
> > + return Err(Error::from_errno(ret));
> > + }
>
> Do you think it’s worth it to have a separate error type for when fences
> are already signaled? I am asking, but I am not convinced either, FYI.
My tendency so far was to mostly ignore it. All users in C don't care
whether the fence already was signaled. dma_fence just returns -EINVAL
in that case (not a good choice anyways IMO).
AFAIR I was close to not having Rust's signal() return an error at all,
because it's kind of useless?
But, correct me if I'm wrong, I think the policy with Rust abstractions
is to map the C API as closely as possible?
Anyways, I'd expect Rust drivers to ignore that error, too.
>
> > +
> > + if self.signalling {
> > + // SAFETY: `dma_fence_end_signalling()` works on global lockdep data. The only
> > + // parameter is a boolean passed by value.
> > + unsafe { bindings::dma_fence_end_signalling(self.signalling_cookie) };
> > + }
> > +
> > + Ok(())
> > + }
> > +
> > + /// Register a callback on a [`DmaFence`]. The callback will be invoked in the fence's
> > + /// signalling path, i.e., critical section.
> > + ///
> > + /// Consumes `data`. `data` is passed back to the implemented callback function when the fence
> > + /// gets signalled.
> > + pub fn register_callback<U: DmaFenceCbFunc + 'static>(&self, data: impl PinInit<U>) -> Result {
> > + let cb = DmaFenceCb::new(data)?;
> > + let ptr = cb.into_foreign() as *mut DmaFenceCb<U>;
> > + // SAFETY: `ptr` was created validly directly above.
> > + let inner_cb = unsafe { (*ptr).inner.get() };
> > +
> > + // SAFETY: `self.as_raw()` is valid because `self` is refcounted, `inner_cb` was created
> > + // validly above and was turned into a ForeignOwnable, so it won't be dropped. `callback`
> > + // has static life time.
> > + let ret = unsafe {
> > + bindings::dma_fence_add_callback(
> > + self.as_raw(),
> > + inner_cb,
> > + Some(DmaFenceCb::<U>::callback),
> > + )
> > + };
> > + if ret != 0 {
> > + return Err(Error::from_errno(ret));
>
> Related to the question above: not being able to place a callback seems to be
> common in the DRM scheduler (i.e.: the fence has signaled already). Hence my
> question about a separate error type, as we would have to single out
> Err(EINVAL) often otherwise.
Interesting, dma_fence_add_callback() actually returns a different
error code, ENOENT, if the fence was signalled already, whereas
dma_fence_signal() gives EINVAL.
That's kind of not optimal. I'll consider fixing that on the C side.
Regardless, what's relevant for us here is that not being able to
register a callback is a serious failure that needs to be checked,
whereas trying to signal an already signaled fence is valid behavior
and no big deal.
If you try to register a callback on an already signaled fence, that
effectively means that you as the registering party need to take care
of the callback's code being executed, since dma_fence won't do that
anymore. The jobqueue design is catching that problem through the API
design; you were asking about the cb-registering API in the other mail,
I'll answer there.
Thx for the review!
P.
Powered by blists - more mailing lists