[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DGC1KP1DT6YV.3LQWZXMA22L5A@kernel.org>
Date: Wed, 11 Feb 2026 10:57:27 +0100
From: "Danilo Krummrich" <dakr@...nel.org>
To: "Alice Ryhl" <aliceryhl@...gle.com>
Cc: Christian König <christian.koenig@....com>, "Boris
Brezillon" <boris.brezillon@...labora.com>, "Philipp Stanner"
<phasta@...lbox.org>, <phasta@...nel.org>, "David Airlie"
<airlied@...il.com>, "Simona Vetter" <simona@...ll.ch>, "Gary Guo"
<gary@...yguo.net>, "Benno Lossin" <lossin@...nel.org>, "Daniel Almeida"
<daniel.almeida@...labora.com>, "Joel Fernandes" <joelagnelf@...dia.com>,
<linux-kernel@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
<rust-for-linux@...r.kernel.org>, <lucas.demarchi@...el.com>,
<thomas.hellstrom@...ux.intel.com>, <rodrigo.vivi@...el.com>
Subject: Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
(Cc: Xe maintainers)
On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
>> On 2/10/26 11:36, Danilo Krummrich wrote:
>> > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
>> >> One way you can see this is by looking at what we require of the
>> >> workqueue. For all this to work, it's pretty important that we never
>> >> schedule anything on the workqueue that's not signalling safe, since
>> >> otherwise you could have a deadlock where the workqueue is executes some
>> >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
>> >> meaning that the VM_BIND job never gets scheduled since the workqueue
>> >> is never freed up. Deadlock.
>> >
>> > Yes, I also pointed this out multiple times in the past in the context of C GPU
>> > scheduler discussions. It really depends on the workqueue and how it is used.
>> >
>> > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
>> > which means that the driver has to ensure that at least one out of the
>> > wq->max_active works is free for the scheduler to make progress on the
>> > scheduler's run and free job work.
>> >
>> > Or in other words, there must be no more than wq->max_active - 1 works that
>> > execute code violating the DMA fence signalling rules.
>
> Ouch, is that really the best way to do that? Why not two workqueues?
Most drivers making use of this re-use the same workqueue for multiple GPU
scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
scheduler and entity). This is equivalent to the JobQ use-case.
Note that we will have one JobQ instance per userspace queue, so sharing the
workqueue between JobQ instances can make sense.
Besides that, IIRC Xe was re-using the workqueue for something else, but that
doesn't seem to be the case anymore. I can only find [1], which more seems like
some custom GPU scheduler extention [2] to me...
[1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler.c#L40
[2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h#L28
Powered by blists - more mailing lists