linux-kernel - Re: [RFC v8 00/21] DRM scheduling cgroup controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aa7dedae-8f31-49f9-ad73-009cb8550b93@kernel.org>
Date: Tue, 7 Oct 2025 16:44:59 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Boris Brezillon <boris.brezillon@...labora.com>
Cc: Philipp Stanner <phasta@...lbox.org>, phasta@...nel.org,
 Tvrtko Ursulin <tvrtko.ursulin@...lia.com>, dri-devel@...ts.freedesktop.org,
 amd-gfx@...ts.freedesktop.org, kernel-dev@...lia.com,
 intel-xe@...ts.freedesktop.org, cgroups@...r.kernel.org,
 linux-kernel@...r.kernel.org, Christian König
 <christian.koenig@....com>, Leo Liu <Leo.Liu@....com>,
 Maíra Canal <mcanal@...lia.com>,
 Matthew Brost <matthew.brost@...el.com>, Michal Koutný
 <mkoutny@...e.com>, Michel Dänzer
 <michel.daenzer@...lbox.org>,
 Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>,
 Rob Clark <robdclark@...il.com>, Tejun Heo <tj@...nel.org>,
 Alexandre Courbot <acourbot@...dia.com>, Alistair Popple
 <apopple@...dia.com>, John Hubbard <jhubbard@...dia.com>,
 Joel Fernandes <joelagnelf@...dia.com>, Timur Tabi <ttabi@...dia.com>,
 Alex Deucher <alexander.deucher@....com>,
 Lucas De Marchi <lucas.demarchi@...el.com>,
 Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
 Rodrigo Vivi <rodrigo.vivi@...el.com>, Rob Herring <robh@...nel.org>,
 Steven Price <steven.price@....com>, Liviu Dudau <liviu.dudau@....com>,
 Daniel Almeida <daniel.almeida@...labora.com>,
 Alice Ryhl <aliceryhl@...gle.com>, Boqun Feng <boqunf@...flix.com>,
 Grégoire Péan <gpean@...flix.com>,
 Simona Vetter <simona@...ll.ch>, airlied@...il.com
Subject: Re: [RFC v8 00/21] DRM scheduling cgroup controller

On 9/30/25 1:57 PM, Boris Brezillon wrote:
> Can you remind me what the problem is? I thought the lifetime issue was
> coming from the fact the drm_sched ownership model was lax enough that
> the job could be owned by both drm_gpu_scheduler and drm_sched_entity
> at the same time.

I don't think that's (directly) a thing from the perspective of the drm_sched
design. A job should be either in the entity queue for the pending_list of the
scheduler.

However, different drivers do implement their own lifetime (and ownership) model
on top of that, because they ultimately have to deal with jobs being either tied
to the entity or the scheduler lifetime, which is everything else but strait
forward in error cases and tear down paths.

And the fundamental problem why drivers implement their own rules on top of this
is because it is hard to deal with jobs being tied to entirely different
lifetime model depending on their state.

So, what I'm saying is that from the perspective of the component itself it's
probably fine, but for the application in drivers it's the root cause for a lot
of the hacks we see on top of the scheduler in drivers.

Some of those hacks even make their way into the scheduler [1].

[1]
https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/scheduler/sched_main.c#L1439

>> Instead, I think the new Jobqueue should always own and always dispatch jobs
>> directly and provide some "control API" to be instructed by an external
>> component (orchestrator) on top of it when and to which ring to dispatch jobs.
> 
> Feels to me that we're getting back to a model where the JobQueue needs
> to know about the upper-layer in charge of the scheduling. I mean, it
> can work, but you're adding some complexity back to JobQueue, which I
> was expecting to be a simple FIFO with a dep-tracking logic.

Yes, the Jobqueue would need an interface to the orchestrator. I rather have the
complexity encapsulated in the Jobqueue, rather than pushing the complexity to
drivers by having a more complex lifetime and ownership model that leaks into
drivers as mentioned above.

> I have a hard time seeing how it can fully integrate in this
> orchestrator model. We can hook ourselves in the JobQueue::run_job()
> and schedule the group for execution when we queue a job to the
> ringbuf, but the group scheduler would still be something on the side.

Can you please expand a bit more on the group model?

My understanding is that you have a limited number of firmware rings (R) and
each of those rings has N slots, where N is the number of queue types supported
by the GPU.

So, you need something that can schedule "groups" of queues over all available
firmware rings, because it would be pointless to schedule each individual queue
independently, as a firmware ring has slots for each of those. Is that correct?