linux-kernel - Re: [PATCH] drm/sched: Prevent stopped entities from being added to the run queue.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aH9Pj+eIuIgNiL69@lstrano-desk.jf.intel.com>
Date: Tue, 22 Jul 2025 01:45:03 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: <phasta@...nel.org>
CC: Danilo Krummrich <dakr@...nel.org>, James Flowers
	<bold.zone2373@...tmail.com>, <ckoenig.leichtzumerken@...il.com>,
	<maarten.lankhorst@...ux.intel.com>, <mripard@...nel.org>,
	<tzimmermann@...e.de>, <airlied@...il.com>, <simona@...ll.ch>,
	<skhan@...uxfoundation.org>, <dri-devel@...ts.freedesktop.org>,
	<linux-kernel@...r.kernel.org>, <linux-kernel-mentees@...ts.linux.dev>,
	Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
Subject: Re: [PATCH] drm/sched: Prevent stopped entities from being added to
 the run queue.

On Tue, Jul 22, 2025 at 01:07:29AM -0700, Matthew Brost wrote:
> On Tue, Jul 22, 2025 at 09:37:11AM +0200, Philipp Stanner wrote:
> > On Mon, 2025-07-21 at 11:07 -0700, Matthew Brost wrote:
> > > On Mon, Jul 21, 2025 at 12:14:31PM +0200, Danilo Krummrich wrote:
> > > > On Mon Jul 21, 2025 at 10:16 AM CEST, Philipp Stanner wrote:
> > > > > On Mon, 2025-07-21 at 09:52 +0200, Philipp Stanner wrote:
> > > > > > On Sun, 2025-07-20 at 16:56 -0700, James Flowers wrote:
> > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > index bfea608a7106..997a2cc1a635 100644
> > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > @@ -172,8 +172,10 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
> > > > > > >  
> > > > > > >  	entity->oldest_job_waiting = ts;
> > > > > > >  
> > > > > > > -	rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
> > > > > > > -		      drm_sched_entity_compare_before);
> > > > > > > +	if (!entity->stopped) {
> > > > > > > +		rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
> > > > > > > +			      drm_sched_entity_compare_before);
> > > > > > > +	}
> > > > > > 
> > > > > > If this is a race, then this patch here is broken, too, because you're
> > > > > > checking the 'stopped' boolean as the callers of that function do, too
> > > > > > – just later. :O
> > > > > > 
> > > > > > Could still race, just less likely.
> > > > > > 
> > > > > > The proper way to fix it would then be to address the issue where the
> > > > > > locking is supposed to happen. Let's look at, for example,
> > > > > > drm_sched_entity_push_job():
> > > > > > 
> > > > > > 
> > > > > > void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> > > > > > {
> > > > > > 	(Bla bla bla)
> > > > > > 
> > > > > >  	…………
> > > > > > 
> > > > > > 	/* first job wakes up scheduler */
> > > > > > 	if (first) {
> > > > > > 		struct drm_gpu_scheduler *sched;
> > > > > > 		struct drm_sched_rq *rq;
> > > > > > 
> > > > > > 		/* Add the entity to the run queue */
> > > > > > 		spin_lock(&entity->lock);
> > > > > > 		if (entity->stopped) {                  <---- Aha!
> > > > > > 			spin_unlock(&entity->lock);
> > > > > > 
> > > > > > 			DRM_ERROR("Trying to push to a killed entity\n");
> > > > > > 			return;
> > > > > > 		}
> > > > > > 
> > > > > > 		rq = entity->rq;
> > > > > > 		sched = rq->sched;
> > > > > > 
> > > > > > 		spin_lock(&rq->lock);
> > > > > > 		drm_sched_rq_add_entity(rq, entity);
> > > > > > 
> > > > > > 		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > > > > > 			drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); <---- bumm!
> > > > > > 
> > > > > > 		spin_unlock(&rq->lock);
> > > > > > 		spin_unlock(&entity->lock);
> > > > > > 
> > > > > > But the locks are still being hold. So that "shouldn't be happening"(tm).
> > > > > > 
> > > > > > Interesting. AFAICS only drm_sched_entity_kill() and drm_sched_fini()
> > > > > > stop entities. The former holds appropriate locks, but drm_sched_fini()
> > > > > > doesn't. So that looks like a hot candidate to me. Opinions?
> > > > > > 
> > > > > > On the other hand, aren't drivers prohibited from calling
> > > > > > drm_sched_entity_push_job() after calling drm_sched_fini()? If the
> > > > > > fuzzer does that, then it's not the scheduler's fault.
> > > > 
> > > > Exactly, this is the first question to ask.
> > > > 
> > > > And I think it's even more restrictive:
> > > > 
> > > > In drm_sched_fini()
> > > > 
> > > > 	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
> > > > 		struct drm_sched_rq *rq = sched->sched_rq[i];
> > > > 
> > > > 		spin_lock(&rq->lock);
> > > > 		list_for_each_entry(s_entity, &rq->entities, list)
> > > > 			/*
> > > > 			 * Prevents reinsertion and marks job_queue as idle,
> > > > 			 * it will be removed from the rq in drm_sched_entity_fini()
> > > > 			 * eventually
> > > > 			 */
> > > > 			s_entity->stopped = true;
> > > > 		spin_unlock(&rq->lock);
> > > > 		kfree(sched->sched_rq[i]);
> > > > 	}
> > > > 
> > > > In drm_sched_entity_kill()
> > > > 
> > > > 	static void drm_sched_entity_kill(struct drm_sched_entity *entity)
> > > > 	{
> > > > 		struct drm_sched_job *job;
> > > > 		struct dma_fence *prev;
> > > > 
> > > > 		if (!entity->rq)
> > > > 			return;
> > > > 
> > > > 		spin_lock(&entity->lock);
> > > > 		entity->stopped = true;
> > > > 		drm_sched_rq_remove_entity(entity->rq, entity);
> > > > 		spin_unlock(&entity->lock);
> > > > 
> > > > 		[...]
> > > > 	}
> > > > 
> > > > If this runs concurrently, this is a UAF as well.
> > > > 
> > > > Personally, I have always been working with the assupmtion that entites have to
> > > > be torn down *before* the scheduler, but those lifetimes are not documented
> > > > properly.
> > > 
> > > Yes, this is my assumption too. I would even take it further: an entity
> > > shouldn't be torn down until all jobs associated with it are freed as
> > > well. I think this would solve a lot of issues I've seen on the list
> > > related to UAF, teardown, etc.
> > 
> > That's kind of impossible with the new tear down design, because
> > drm_sched_fini() ensures that all jobs are freed on teardown. And
> > drm_sched_fini() wouldn't be called before all jobs are gone,
> > effectively resulting in a chicken-egg-problem, or rather: the driver
> > implementing its own solution for teardown.
> > 
> 
> I've read this four times and I'm still generally confused.
> 
> "drm_sched_fini ensures that all jobs are freed on teardown" — Yes,
> that's how a refcounting-based solution works. drm_sched_fini would
> never be called if there were pending jobs.
> 
> "drm_sched_fini() wouldn't be called before all jobs are gone" — See
> above.
> 
> "effectively resulting in a chicken-and-egg problem" — A job is created
> after the scheduler, and it holds a reference to the scheduler until
> it's freed. I don't see how this idiom applies.
> 
> "the driver implementing its own solution for teardown" — It’s just
> following the basic lifetime rules I outlined below. Perhaps Xe was
> ahead of its time, but the number of DRM scheduler blowups we've had is
> zero — maybe a strong indication that this design is correct.
> 

Sorry—self-reply.

To expand on this: the reason Xe implemented a refcount-based teardown
solution is because the internals of the DRM scheduler during teardown
looked wildly scary. A lower layer should not impose its will on upper
layers. I think that’s the root cause of all the problems I've listed.

In my opinion, we should document the lifetime rules I’ve outlined, fix
all drivers accordingly, and assert these rules in the scheduler layer.

Matt

> Matt
> 
> > P.
> > 
> > 
> > > 
> > > > 
> > > > There are two solutions:
> > > > 
> > > >   (1) Strictly require all entities to be torn down before drm_sched_fini(),
> > > >       i.e. stick to the natural ownership and lifetime rules here (see below).
> > > > 
> > > >   (2) Actually protect *any* changes of the relevent fields of the entity
> > > >       structure with the entity lock.
> > > > 
> > > > While (2) seems rather obvious, we run into lock inversion with this approach,
> > > > as you note below as well. And I think drm_sched_fini() should not mess with
> > > > entities anyways.
> > > > 
> > > > The ownership here seems obvious:
> > > > 
> > > > The scheduler *owns* a resource that is used by entities. Consequently, entities
> > > > are not allowed to out-live the scheduler.
> > > > 
> > > > Surely, the current implementation to just take the resource away from the
> > > > entity under the hood can work as well with appropriate locking, but that's a
> > > > mess.
> > > > 
> > > > If the resource *really* needs to be shared for some reason (which I don't see),
> > > > shared ownership, i.e. reference counting, is much less error prone.
> > > 
> > > Yes, Xe solves all of this via reference counting (jobs refcount the
> > > entity). It's a bit easier in Xe since the scheduler and entities are
> > > the same object due to their 1:1 relationship. But even in non-1:1
> > > relationships, an entity could refcount the scheduler. The teardown
> > > sequence would then be: all jobs complete on the entity → teardown the
> > > entity → all entities torn down → teardown the scheduler.
> > > 
> > > Matt
> >