[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e08b320a9d81faee6d1ec2a3fe8a1df6773c8f6.camel@mailbox.org>
Date: Wed, 29 Oct 2025 08:30:48 +0100
From: Philipp Stanner <phasta@...lbox.org>
To: Matthew Brost <matthew.brost@...el.com>, Philipp Stanner
<phasta@...nel.org>
Cc: Danilo Krummrich <dakr@...nel.org>, Christian König
<ckoenig.leichtzumerken@...il.com>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>, tursulin@...ulin.net,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] drm/sched: Add FIXME detailing potential hang
On Tue, 2025-10-28 at 12:43 -0700, Matthew Brost wrote:
> On Tue, Oct 28, 2025 at 02:46:02PM +0100, Philipp Stanner wrote:
> > If a job from a ready entity needs more credits than are currently
> > available, drm_sched_run_job_work() (a work item) simply returns and
> > doesn't reschedule itself. The scheduler is only woken up again when the
> > next job gets pushed with drm_sched_entity_push_job().
> >
> > If someone submits a job that needs too many credits and doesn't submit
> > more jobs afterwards, this would lead to the scheduler never pulling the
> > too-expensive job, effectively hanging forever.
> >
> > Document this problem as a FIXME.
> >
> > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > ---
> > drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 492e8af639db..eaf8d17b2a66 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1237,6 +1237,16 @@ static void drm_sched_run_job_work(struct work_struct *w)
> >
> > /* Find entity with a ready job */
> > entity = drm_sched_select_entity(sched);
> > + /*
> > + * FIXME:
> > + * The entity can be NULL when the scheduler currently has no capacity
> > + * (credits) for more jobs. If that happens, the work item terminates
> > + * itself here, without rescheduling itself.
> > + *
> > + * It only gets started again in drm_sched_entity_push_job(). IOW, the
> > + * scheduler might hang forever if a job that needs too many credits
> > + * gets submitted to an entity and no other, subsequent jobs are.
> > + */
>
> drm_sched_job_done frees the credits, which triggers
> drm_sched_free_job_work, and that in turn triggers
> drm_sched_run_job_work.
Sounds correct to me.
We can still merge #1, though, for a bit more clearness.
P.
Powered by blists - more mailing lists