[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95f663076b59fc85818d14a1a14416b88aae1c73.camel@mailbox.org>
Date: Mon, 10 Mar 2025 13:37:52 +0100
From: Philipp Stanner <phasta@...lbox.org>
To: Christian König <christian.koenig@....com>,
phasta@...nel.org, Christian König
<ckoenig.leichtzumerken@...il.com>, matthew.brost@...el.com,
dakr@...nel.org, tvrtko.ursulin@...lia.com,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/sched: revert "drm_sched_job_cleanup(): correct
false doc"
On Mon, 2025-03-10 at 13:25 +0100, Christian König wrote:
> Am 10.03.25 um 13:11 schrieb Philipp Stanner:
> > On Mon, 2025-03-10 at 08:44 +0100, Christian König wrote:
> > > This reverts commit 44d2f310f008613c1dbe5e234c2cf2be90cbbfab.
> > OK, your arguments with fence ordering are strong. Please update
> > the
> > commit message according to our discussion:
> >
> > > Sorry for the delayed response, I only stumbled over this now
> > > while
> > > going
> > > over old mails and then re-thinking my reviewed by for this
> > > change.
> > Your RB hadn't even been applied (I merged before you gave it), so
> > you
> > can remove this first paragraph from the commit message
> >
> > > The function drm_sched_job_arm() is indeed the point of no
> > > return.
> > > The
> > > background is that it is nearly impossible for the driver to
> > > correctly
> > > retract the fence and signal it in the order enforced by the
> > > dma_fence
> > > framework.
> > >
> > > The code in drm_sched_job_cleanup() is for the purpose to cleanup
> > > after
> > > the job was armed through drm_sched_job_arm() *and* processed by
> > > the
> > > scheduler.
> > >
> > > The correct approach for error handling in this situation is to
> > > set
> > > the
> > > error on the fences and then push to the entity anyway. We can
> > > certainly
> > > improve the documentation, but removing the warning is clearly
> > > not a
> > > good
> > > idea.
> > This last paragraph, as per our discussion, seems invalid. We
> > shouldn't
> > have that in the commit log, so that it won't give later hackers
> > browsing it wrong ideas and we end up with someone actually
> > mengling
> > with those fences.
>
> Sure, going to make those updates. I just wanted to give people a
> possible direction to look into when they really run into a situation
> where they need to abort some submission very late.
>
> Should I also clarify the comment in drm_sched_job_cleanup()?
Shouldn't be done in a revert-commit I think.
I'd say we revert for now (it's not a big loss, just a bit of docu) and
then you (or maybe I) submit a new patch where we discuss Tvrtko's
questions and then write down the rules and issues precisely in both
commit message and the function's comments.
P.
>
> Regards,
> Christian.
>
> >
> > Thx
> > P.
> >
> > > Signed-off-by: Christian König <christian.koenig@....com>
> > > ---
> > > drivers/gpu/drm/scheduler/sched_main.c | 12 +++++-------
> > > 1 file changed, 5 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 53e6aec37b46..4d4219fbe49d 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1015,13 +1015,11 @@
> > > EXPORT_SYMBOL(drm_sched_job_has_dependency);
> > > * Cleans up the resources allocated with drm_sched_job_init().
> > > *
> > > * Drivers should call this from their error unwind code if @job
> > > is
> > > aborted
> > > - * before it was submitted to an entity with
> > > drm_sched_entity_push_job().
> > > + * before drm_sched_job_arm() is called.
> > > *
> > > - * Since calling drm_sched_job_arm() causes the job's fences to
> > > be
> > > initialized,
> > > - * it is up to the driver to ensure that fences that were
> > > exposed to
> > > external
> > > - * parties get signaled. drm_sched_job_cleanup() does not ensure
> > > this.
> > > - *
> > > - * This function must also be called in &struct
> > > drm_sched_backend_ops.free_job
> > > + * After that point of no return @job is committed to be
> > > executed by
> > > the
> > > + * scheduler, and this function should be called from the
> > > + * &drm_sched_backend_ops.free_job callback.
> > > */
> > > void drm_sched_job_cleanup(struct drm_sched_job *job)
> > > {
> > > @@ -1032,7 +1030,7 @@ void drm_sched_job_cleanup(struct
> > > drm_sched_job
> > > *job)
> > > /* drm_sched_job_arm() has been called */
> > > dma_fence_put(&job->s_fence->finished);
> > > } else {
> > > - /* aborted job before arming */
> > > + /* aborted job before committing to run it */
> > > drm_sched_fence_free(job->s_fence);
> > > }
> > >
>
Powered by blists - more mailing lists