[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250320085303.71803639@collabora.com>
Date: Thu, 20 Mar 2025 08:53:03 +0100
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Adrian Larumbe <adrian.larumbe@...labora.com>
Cc: Ashley Smith <ashley.smith@...labora.com>, Steven Price
<steven.price@....com>, Liviu Dudau <liviu.dudau@....com>, Maarten
Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
<mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Heiko Stuebner
<heiko@...ech.de>, kernel@...labora.com, Daniel Stone
<daniels@...labora.com>, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] drm/panthor: Make the timeout per-queue instead of
per-job
On Wed, 19 Mar 2025 19:51:47 +0000
Adrian Larumbe <adrian.larumbe@...labora.com> wrote:
> On 10.03.2025 13:30, Ashley Smith wrote:
> > The timeout logic provided by drm_sched leads to races when we try
> > to suspend it while the drm_sched workqueue queues more jobs. Let's
> > overhaul the timeout handling in panthor to have our own delayed work
> > that's resumed/suspended when a group is resumed/suspended. When an
> > actual timeout occurs, we call drm_sched_fault() to report it
> > through drm_sched, still. But otherwise, the drm_sched timeout is
> > disabled (set to MAX_SCHEDULE_TIMEOUT), which leaves us in control of
> > how we protect modifications on the timer.
> >
> > One issue seems to be when we call drm_sched_suspend_timeout() from
> > both queue_run_job() and tick_work() which could lead to races due to
> > drm_sched_suspend_timeout() not having a lock. Another issue seems to
> > be in queue_run_job() if the group is not scheduled, we suspend the
> > timeout again which undoes what drm_sched_job_begin() did when calling
> > drm_sched_start_timeout(). So the timeout does not reset when a job
> > is finished.
> >
> > Co-developed-by: Boris Brezillon <boris.brezillon@...labora.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon@...labora.com>
> > Tested-by: Daniel Stone <daniels@...labora.com>
> > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
> > Signed-off-by: Ashley Smith <ashley.smith@...labora.com>
>
> Reviewed-by: Adrián Larumbe <adrian.larumbe@...labora.com>
>
> > ---
> > drivers/gpu/drm/panthor/panthor_sched.c | 233 +++++++++++++++++-------
> > 1 file changed, 167 insertions(+), 66 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index 4d31d1967716..5f02d2ec28f9 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -360,17 +360,20 @@ struct panthor_queue {
> > /** @entity: DRM scheduling entity used for this queue. */
> > struct drm_sched_entity entity;
> >
> > - /**
> > - * @remaining_time: Time remaining before the job timeout expires.
> > - *
> > - * The job timeout is suspended when the queue is not scheduled by the
> > - * FW. Every time we suspend the timer, we need to save the remaining
> > - * time so we can restore it later on.
> > - */
> > - unsigned long remaining_time;
> > + /** @timeout: Queue timeout related fields. */
> > + struct {
> > + /** @timeout.work: Work executed when a queue timeout occurs. */
> > + struct delayed_work work;
>
> Nit: Maybe for the sake of sticking to the convention of naming already
> existing delayed_work structs in a way that reflects their goal, call
> this one 'timeout_work'.
It's already under the timeout struct, and naming it timeout_work would
be redundant IMHO (timeout.timeout_work vs timeout.work).
Powered by blists - more mailing lists