linux-kernel - Re: [RFC v8 00/21] DRM scheduling cgroup controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DD62YFG2CJ36.1NFKRTR2ZKD6V@kernel.org>
Date: Tue, 30 Sep 2025 12:58:29 +0200
From: "Danilo Krummrich" <dakr@...nel.org>
To: "Boris Brezillon" <boris.brezillon@...labora.com>
Cc: "Philipp Stanner" <phasta@...lbox.org>, <phasta@...nel.org>, "Tvrtko
 Ursulin" <tvrtko.ursulin@...lia.com>, <dri-devel@...ts.freedesktop.org>,
 <amd-gfx@...ts.freedesktop.org>, <kernel-dev@...lia.com>,
 <intel-xe@...ts.freedesktop.org>, <cgroups@...r.kernel.org>,
 <linux-kernel@...r.kernel.org>, Christian König
 <christian.koenig@....com>, "Leo Liu" <Leo.Liu@....com>,
 Maíra Canal <mcanal@...lia.com>, "Matthew Brost"
 <matthew.brost@...el.com>, Michal Koutný
 <mkoutny@...e.com>, Michel Dänzer
 <michel.daenzer@...lbox.org>, "Pierre-Eric Pelloux-Prayer"
 <pierre-eric.pelloux-prayer@....com>, "Rob Clark" <robdclark@...il.com>,
 "Tejun Heo" <tj@...nel.org>, "Alexandre Courbot" <acourbot@...dia.com>,
 "Alistair Popple" <apopple@...dia.com>, "John Hubbard"
 <jhubbard@...dia.com>, "Joel Fernandes" <joelagnelf@...dia.com>, "Timur
 Tabi" <ttabi@...dia.com>, "Alex Deucher" <alexander.deucher@....com>,
 "Lucas De Marchi" <lucas.demarchi@...el.com>,
 Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
 "Rodrigo Vivi" <rodrigo.vivi@...el.com>, "Rob Herring" <robh@...nel.org>,
 "Steven Price" <steven.price@....com>, "Liviu Dudau" <liviu.dudau@....com>,
 "Daniel Almeida" <daniel.almeida@...labora.com>, "Alice Ryhl"
 <aliceryhl@...gle.com>, "Boqun Feng" <boqunf@...flix.com>,
 Grégoire Péan <gpean@...flix.com>, "Simona Vetter"
 <simona@...ll.ch>, <airlied@...il.com>
Subject: Re: [RFC v8 00/21] DRM scheduling cgroup controller

On Tue Sep 30, 2025 at 12:12 PM CEST, Boris Brezillon wrote:
> So, my take on that is that what we want ultimately is to have the
> functionality provided by drm_sched split into different
> components that can be used in isolation, or combined to provide
> advanced scheduling.
>
> JobQueue:
>  - allows you to queue jobs with their deps
>  - dequeues jobs once their deps are met
> Not too sure if we want a push or a pull model for the job dequeuing,
> but the idea is that once the job is dequeued, ownership is passed to
> the SW entity that dequeued it. Note that I intentionally didn't add
> the timeout handling here, because dequeueing a job doesn't necessarily
> mean it's started immediately. If you're dealing with HW queues, you
> might have to wait for a slot to become available. If you're dealing
> with something like Mali-CSF, where the amount of FW slots is limited,
> you want to wait for your execution context to be passed to the FW for
> scheduling, and the final situation is the full-fledged FW scheduling,
> where you want things to start as soon as you have space in your FW
> queue (AKA ring-buffer?).
>
> JobHWDispatcher: (not sure about the name, I'm bad at naming things)
> This object basically pulls ready-jobs from one or multiple JobQueues
> into its own queue, and wait for a HW slot to become available. If you
> go for the push model, the job gets pushed to the HW dispatcher queue
> and waits here until a HW slot becomes available.
> That's where timeouts should be handled, because the job only becomes
> active when it gets pushed to a HW slot. I guess if we want a
> resubmit mechanism, it would have to take place here, but give how
> tricky this has been, I'd be tempted to leave that to drivers, that is,
> let them requeue the non-faulty jobs directly to their
> JobHWDispatcher implementation after a reset.
>
> FWExecutionContextScheduler: (again, pick a different name if you want)
> This scheduler doesn't know about jobs, meaning there's a
> driver-specific entity that needs to dequeue jobs from the JobQueue
> and push those to the relevant ringbuffer. Once a FWExecutionContext
> has something to execute, it becomes a candidate for
> FWExecutionContextScheduler, which gets to decide which set of
> FWExecutionContext get a chance to be scheduled by the FW.
> That one is for Mali-CSF case I described above, and I'm not too sure
> we want it to be generic, at least not until we have another GPU driver
> needing the same kind of scheduling. Again, you want to defer the
> timeout handling to this component, because the timer should only
> start/resume when the FWExecutionContext gets scheduled, and it should
> be paused as soon as the context gets evicted.

This sounds pretty much like the existing design with the Panthor group
scheduler layered on top of it, no?

Though, one of the fundamental problems I'd like to get rid of is that job
ownership is transferred between two components with fundamentally different
lifetimes (entity and scheduler).

Instead, I think the new Jobqueue should always own and always dispatch jobs
directly and provide some "control API" to be instructed by an external
component (orchestrator) on top of it when and to which ring to dispatch jobs.

The group scheduling logic you need for some Mali GPUs can either be implemented
by hooks into this orchestrator or by a separate component that attaches to the
same control API of the Jobqueue.

> TLDR; I think the main problem we had with drm_sched is that it had
> this clear drm_sched_entity/drm_gpu_scheduler separation, but those two
> components where tightly tied together, with no way to use
> drm_sched_entity alone for instance, and this led to the weird
> lifetime/ownership issues that the rust effort made more apparent. If we
> get to design something new, I think we should try hard to get a clear
> isolation between each of these components so they can be used alone or
> combined, with a clear job ownership model.

This I agree with, but as explained above I'd go even one step further.