[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9e00d8c-93c0-45f8-95f2-3235fb781e7a@amd.com>
Date: Tue, 10 Feb 2026 16:45:14 +0100
From: Christian König <christian.koenig@....com>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: Boris Brezillon <boris.brezillon@...labora.com>,
Philipp Stanner <phasta@...lbox.org>, phasta@...nel.org,
Danilo Krummrich <dakr@...nel.org>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>, Gary Guo <gary@...yguo.net>,
Benno Lossin <lossin@...nel.org>,
Daniel Almeida <daniel.almeida@...labora.com>,
Joel Fernandes <joelagnelf@...dia.com>, linux-kernel@...r.kernel.org,
dri-devel@...ts.freedesktop.org, rust-for-linux@...r.kernel.org
Subject: Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
On 2/10/26 16:07, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
>> On 2/10/26 14:49, Alice Ryhl wrote:
>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>> Alice Ryhl <aliceryhl@...gle.com> wrote:
>>>>
>>>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
>>>>>> On Tue, 10 Feb 2026 10:15:04 +0000
>>>>>> Alice Ryhl <aliceryhl@...gle.com> wrote:
>>>>>>
>>>>>>> impl MustBeSignalled<'_> {
>>>>>>> /// Drivers generally should not use this one.
>>>>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>>>>>>>
>>>>>>> /// One way to ensure the fence has been signalled is to signal it.
>>>>>>> fn signal_fence(self) -> WillBeSignalled {
>>>>>>> self.fence.signal();
>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>> }
>>>>>>>
>>>>>>> /// Another way to ensure the fence will be signalled is to spawn a
>>>>>>> /// workqueue item that promises to signal it.
>>>>>>> fn transfer_to_wq(
>>>>>>> self,
>>>>>>> wq: &Workqueue,
>>>>>>> item: impl DmaFenceWorkItem,
>>>>>>> ) -> WillBeSignalled {
>>>>>>> // briefly obtain the lock class of the wq to indicate to
>>>>>>> // lockdep that the signalling path "blocks" on arbitrary jobs
>>>>>>> // from this wq completing
>>>>>>> bindings::lock_acquire(&wq->key);
>>>>>>> bindings::lock_release(&wq->key);
>>>>>>>
>>>>>>> // enqueue the job
>>>>>>> wq.enqueue(item, wq);
>>>>>>>
>>>>>>> // The signature of DmaFenceWorkItem::run() promises to arrange
>>>>>>> // for it to be signalled.
>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>> }
>>>>>>
>>>>>> I guess what's still missing is some sort of `transfer_to_hw()`
>>>>>> function and way to flag the IRQ handler taking over the fence
>>>>>> signaling token.
>>>>>
>>>>> Yes, transfer to hardware needs to be another piece of logic similar to
>>>>> transfer to wq. And I imagine there are many ways such a transfer to
>>>>> hardware could work.
>>>>>
>>>>> Unless you have a timeout on it, in which case the WillBeSignalled is
>>>>> satisfied by the fact you have a timeout alone, and the signalling that
>>>>> happens from the irq is just an opportunistic signal from outside the
>>>>> dma fence signalling critical path.
>>>>
>>>> Yes and no. If it deadlocks in the completion WorkItem because of
>>>> allocations (or any of the forbidden use cases), I think we want to
>>>> catch that, because that's a sign fences are likely to end up with
>>>> timeouts when they should have otherwise been signaled properly.
>>>>
>>>>> Well ... unless triggering timeouts can block on GFP_KERNEL
>>>>> allocations...
>>>>
>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>> path, and the same rules should apply to it.
>>>
>>> I guess that's fair. Even with a timeout you want both to be signalling
>>> path.
>>>
>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>> whichever happens first, you have the choice between:
>>
>> That doesn't happen in practice.
>>
>> For each fence you only have one signaling path you need to guarantee
>> forward progress for.
>>
>> All other signaling paths are just opportunistically optimizations
>> which *can* signal the fence, but there is no guarantee that they
>> will.
>>
>> We used to have some exceptions to that, especially around aborting
>> submissions, but those turned out to be a really bad idea as well.
>>
>> Thinking more about it you should probably enforce that there is only
>> one signaling path for each fence signaling.
>
> I'm not really convinced by this.
>
> First, the timeout path must be a fence signalling path because the
> reason you have a timeout in the first place is because the hw might
> never signal the fence. So if the timeout path deadlocks on a
> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
In other words the timeout handler either disables the normal signaling path (e.g. by disabling the interrupt) and then reset the HW or it tells the HW to force signal some work and observes the result.
So it can be that the timeout handler finishes only after the fence is signaled from the normal signaling paths.
> Second, for the reasons I mentioned you also want the signal-from-irq
> path to be a fence signalling critical path, because if we allow you to
> kmalloc(GFP_KERNEL) on the path from getting notification from hardware
> to signalling the fence, then you may deadlock until the timeout
> triggers ... even if the deadlock is only temporary, we should still
> avoid such cases IMO. Thus, the hw signal path should also be a fence
> signalling critical path.
As far as I remember we didn't had any of such cases.
You can't call kmalloc(GFP_KERNEL) from an interrupt handler, so you would need something like irq->work item->kmalloc(GFP_KERNEL)->signaling and I think that's unlikely to be implemented this way.
But yeah, it is still something which should be prevented somehow.
Christian.
>
> Alice
Powered by blists - more mailing lists