linux-kernel - Re: [RFC v8 00/21] DRM scheduling cgroup controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4453e5989b38e99588efd53af674b69016b2c420.camel@mailbox.org>
Date: Tue, 30 Sep 2025 11:00:00 +0200
From: Philipp Stanner <phasta@...lbox.org>
To: Danilo Krummrich <dakr@...nel.org>, Tvrtko Ursulin
	 <tvrtko.ursulin@...lia.com>
Cc: dri-devel@...ts.freedesktop.org, amd-gfx@...ts.freedesktop.org, 
 kernel-dev@...lia.com, intel-xe@...ts.freedesktop.org,
 cgroups@...r.kernel.org,  linux-kernel@...r.kernel.org, Christian
 König <christian.koenig@....com>, Leo Liu
 <Leo.Liu@....com>,  Maíra Canal <mcanal@...lia.com>,
 Matthew Brost <matthew.brost@...el.com>, Michal Koutný
 <mkoutny@...e.com>, Michel Dänzer
 <michel.daenzer@...lbox.org>, Philipp Stanner <phasta@...nel.org>, 
 Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>, Rob Clark
 <robdclark@...il.com>, Tejun Heo <tj@...nel.org>, Alexandre Courbot
 <acourbot@...dia.com>, Alistair Popple <apopple@...dia.com>, John Hubbard
 <jhubbard@...dia.com>, Joel Fernandes <joelagnelf@...dia.com>, Timur Tabi
 <ttabi@...dia.com>, Alex Deucher <alexander.deucher@....com>, Lucas De
 Marchi <lucas.demarchi@...el.com>, Thomas Hellström
 <thomas.hellstrom@...ux.intel.com>, Rodrigo Vivi <rodrigo.vivi@...el.com>, 
 Boris Brezillon <boris.brezillon@...labora.com>, Rob Herring
 <robh@...nel.org>, Steven Price <steven.price@....com>,  Liviu Dudau
 <liviu.dudau@....com>, Daniel Almeida <daniel.almeida@...labora.com>, Alice
 Ryhl <aliceryhl@...gle.com>, Boqun Feng <boqunf@...flix.com>, 
 Grégoire Péan <gpean@...flix.com>, Simona Vetter
 <simona@...ll.ch>, airlied@...il.com
Subject: Re: [RFC v8 00/21] DRM scheduling cgroup controller

+Cc Sima, Dave

On Mon, 2025-09-29 at 16:07 +0200, Danilo Krummrich wrote:
> On Wed Sep 3, 2025 at 5:23 PM CEST, Tvrtko Ursulin wrote:
> > This is another respin of this old work^1 which since v7 is a total rewrite and
> > completely changes how the control is done.
> 
> I only got some of the patches of the series, can you please send all of them
> for subsequent submissions? You may also want to consider resending if you're
> not getting a lot of feedback due to that. :)
> 
> > On the userspace interface side of things it is the same as before. We have
> > drm.weight as an interface, taking integers from 1 to 10000, the same as CPU and
> > IO cgroup controllers.
> 
> In general, I think it would be good to get GPU vendors to speak up to what kind
> of interfaces they're heading to with firmware schedulers and potential firmware
> APIs to control scheduling; especially given that this will be a uAPI.
> 
> (Adding a couple of folks to Cc.)
> 
> Having that said, I think the basic drm.weight interface is fine and should work
> in any case; i.e. with the existing DRM GPU scheduler in both modes, the
> upcoming DRM Jobqueue efforts and should be generic enough to work with
> potential firmware interfaces we may see in the future.
> 
> Philipp should be talking about the DRM Jobqueue component at XDC (probably just
> in this moment).
> 
> --
> 
> Some more thoughts on the DRM Jobqueue and scheduling:
> 
> The idea behind the DRM Jobqueue is to be, as the name suggests, a component
> that receives jobs from userspace, handles the dependencies (i.e. dma fences),
> and executes the job, e.g. by writing to a firmware managed software ring.
> 
> It basically does what the GPU scheduler does in 1:1 entity-scheduler mode,
> just without all the additional complexity of moving job ownership from one
> component to another (i.e. from entity to scheduler, etc.).
> 
> With just that, there is no scheduling outside the GPU's firmware scheduler of
> course. However, additional scheduler capabilities, e.g. to support hardware
> rings, or manage firmware schedulers that only support a limited number of
> software rings (like some Mali GPUs), can be layered on top of that:
> 
> In contrast to the existing GPU scheduler, the idea would be to keep letting the
> DRM Jobqueue handle jobs submitted by userspace from end to end (i.e. let the
> push to the hardware (or software) ring buffer), but have an additional
> component, whose only purpose is to orchestrate the DRM Jobqueues, by managing
> when they are allowed to push to a ring and which ring they should push to.
> 
> This way we get rid of one of the issue that the existing GPU scheduler moves
> job ownership between components of different lifetimes (entity and scheduler),
> which is one of the fundamental hassles to deal with.


So just a few minutes ago I had a long chat with Sima.

Sima (and I, too, I think) thinks that the very few GPUs that have a
reasonably low limit of firmware rings should just resource-limit
userspace users once the limit of firmware rings is reached.

Basically like with VRAM.

Apparently Sima had suggested that to Panthor in the past? But Panthor
still seems to have implemented yet another scheduler mechanism on top
of the 1:1 entity-scheduler drm_sched setup?

@Boris: Why was that done?

So far I tend to prefer Sima's proposal because I'm currently very
unsure how we could deal with shared firmware rings – because then we'd
need to resubmit jobs, and the currently intended Rust ownership model
would then be at danger, because the Jobqueue would need a:
pending_list.

So we'd be running danger of redesigning drm_sched, whereas with Sima's
idea there'd never be a scheduler anywhere anymore anyways.


P.