[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYw65omhksM7jjQi@google.com>
Date: Wed, 11 Feb 2026 08:16:38 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: Boris Brezillon <boris.brezillon@...labora.com>
Cc: "Christian König" <christian.koenig@....com>, Philipp Stanner <phasta@...lbox.org>, phasta@...nel.org,
Danilo Krummrich <dakr@...nel.org>, David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Gary Guo <gary@...yguo.net>, Benno Lossin <lossin@...nel.org>,
Daniel Almeida <daniel.almeida@...labora.com>, Joel Fernandes <joelagnelf@...dia.com>,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
rust-for-linux@...r.kernel.org
Subject: Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
On Tue, Feb 10, 2026 at 03:50:25PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 14:11:12 +0000
> Alice Ryhl <aliceryhl@...gle.com> wrote:
>
> > On Tue, Feb 10, 2026 at 02:51:56PM +0100, Boris Brezillon wrote:
> > > On Tue, 10 Feb 2026 13:26:48 +0000
> > > Alice Ryhl <aliceryhl@...gle.com> wrote:
> > >
> > > > On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> > > > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > > > Alice Ryhl <aliceryhl@...gle.com> wrote:
> > > > >
> > > > > > /// The owner of this value must ensure that this fence is signalled.
> > > > > > struct MustBeSignalled<'fence> { ... }
> > > > > > /// Proof value indicating that the fence has either already been
> > > > > > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > > > > > /// up the proof value.
> > > > > > struct WillBeSignalled<'fence> { ... }
> > > > >
> > > > > Sorry, I have more questions, unfortunately. Seems that
> > > > > {Must,Will}BeSignalled are targeting specific fences (at least that's
> > > > > what the doc and 'fence lifetime says), but in practice, the WorkItem
> > > > > backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> > > > > met, and N > 1 if more than one job is ready). Similarly, an IRQ
> > > > > handler can signal 0-N fences (can be that the IRQ has nothing to do we
> > > > > job completion, or, it can be that multiple jobs have completed). How
> > > > > is this MustBeSignalled object going to be instantiated in practice if
> > > > > it's done before the DmaFenceWorkItem::run() function is called?
> > > >
> > > > The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
> > > > that ensures a specific fence is signalled. If you have code that
> > > > manages a collection of fences and invokes code for specific fences
> > > > depending on outside conditions, then that's a different matter.
> > > >
> > > > After all, transfer_to_wq() has two components:
> > > > 1. Logic to ensure any spawned workqueue job eventually gets to run.
> > > > 2. Once the individual job runs, logic specific to the one fence ensures
> > > > that this one fence gets signalled.
> > >
> > > Okay, that's a change compared to how things are modeled in C (and in
> > > JobQueue) at the moment: the WorkItem is not embedded in a specific
> > > job, it's something that's attached to the JobQueue. The idea being
> > > that the WorkItem represents a task to be done on the queue itself
> > > (check if the first element in the queue is ready for execution), not on
> > > a particular job. Now, we could change that and have a per-job WorkItem,
> > > but ultimately, we'll have to make sure jobs are dequeued in order
> > > (deps on JobN can be met before deps on Job0, but we still want JobN to
> > > be submitted after Job0), and we'd pay the WorkItem overhead once per
> > > Job instead of once per JobQueue. Probably not the end of the world,
> > > but it's worth considering, still.
> >
> > It sounds like the fix here is to have transfer_to_job_queue() instead
> > of trying to do it at the workqueue level.
>
> Hm, so Job would be something like that (naming/trait-def are just
> suggestions to get the discussion going):
>
> trait JobConsumer {
> type FenceType;
> type JobData;
>
> fn run(self: MustBeSignalled<T::FenceType>) -> Result<WillBeSignaled<Self::FenceType>>;
> }
>
> struct Job<T: JobConsumer> {
> fence: MustBeSignalled<T::FenceType>,
> data: T::JobData,
> }
The fence field of Job would be PublishedFence or PrivateFence (or just
DriverDmaFence). The MustBeSignalled/WillBeSignaled types should only
exist temporarily in a function scope.
Any time you transfer from one function scope to another (like our
transfer_to_job_queue() or transfer_to_wq() examples), that results in
finishing the MustBeSignalled/WillBeSignaled scope on one thread and
creating a new MustBeSignalled/WillBeSignaled scope on another thread.
One could imagine a model where there is no lifetime and you can carry
it around as you wish. That model works okay in most regards, but it
gives up the ability to ensure that dma_fence_lockdep_map is properly
configured to catch mistakes.
The lifetime prohibits you from using the normal ownership semantics to
e.g. transfer the MustBeSignalled into a random workqueue, enforcing
that you can only transfer it into a workqueue by using the provided
methods, which sets up the lockdep dependencies correctly and ensures
that dma_fence_lockdep_map is taken in the workqueue job too.
> I guess that would do.
>
> And then we need to flag the WorkItem that's exposed by the
> JobQueue as a DmaFenceWorkItem so that
> bindings::dma_fence_begin_signalling() is called before entry and
> lockdep can do its job and check that nothing forbidden happens in
> this WorkItem.
In the case of JobQueue, it may make sense to just have the job queue
implementation do that manually. I do not think the workqueue-level API
can fully enforce that the job queue can't make mistakes here.
> > > > And {Must,Will}BeSignalled exists to help model part (2.). But what you
> > > > described with the IRQ callback falls into (1.) instead, which is
> > > > outside the scope of {Must,Will}BeSignalled (or at least requires more
> > > > complex APIs).
> > >
> > > For IRQ callbacks, it's not just about making sure they run, but also
> > > making sure nothing in there can lead to deadlocks, which is basically
> > > #2, except it's not scoped to a particular fence. It's just a "fences
> > > can be signaled from there" marker. We could restrict it to "fences of
> > > this particular implementation can be signaled from there" but not
> > > "this particular fence instance will be signaled next, if any", because
> > > that we don't know until we've walked some HW state to figure out which
> > > job is complete and thus which fence we need to signal (the interrupt
> > > we get is most likely multiplexing completion on multiple GPU contexts,
> > > so before we can even get to our per-context in-flight-jobs FIFO, we
> > > need to demux this thing).
> >
> > All I can say is that this is a different use-case for the C api
> > dma_fence_begin_signalling(). This different usage also seems useful,
> > but it would be one that does not involve {Must,Will}BeSignalled
> > arguments at all.
> >
> > After all, dma_fence_begin_signalling() only requires those arguments if
> > you want to convert a PrivateFence into a PublishedFence. (I guess a
> > better name is PublishableFence.) If you're not trying to prove that a
> > specific fence will be signalled, then you don't need the
> > {Must,Will}BeSignalled arguments.
>
> Okay, so that would be another function returning some sort of guard
> then? What I find confusing is the fact
> dma_fence::dma_fence_begin_signalling() matches the C function name
> which is not per-fence, but just this lock-guard model flagging a
> section from which any fence can be signalled, so maybe we should
> name your dma_fence_begin_signalling() proposal differently, dunno.
Yes we would need multiple methods that call dma_fence_begin_signalling()
depending on why you are calling it.
Alice
Powered by blists - more mailing lists