linux-kernel - Re: [PATCH] drm/sched: drm_sched_job

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6405f50e3c78d0a06c5dd1b82d4a0a84bdf9d24f.camel@mailbox.org>
Date: Tue, 04 Mar 2025 08:27:42 +0100
From: Philipp Stanner <phasta@...lbox.org>
To: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>, Philipp Stanner
 <phasta@...nel.org>, Matthew Brost <matthew.brost@...el.com>, Danilo
 Krummrich <dakr@...nel.org>, Christian König
 <ckoenig.leichtzumerken@...il.com>, Maarten Lankhorst
 <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>, 
 Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/sched: drm_sched_job_cleanup(): correct false doc

On Mon, 2025-03-03 at 17:13 +0000, Tvrtko Ursulin wrote:
> 
> On 03/03/2025 14:32, Philipp Stanner wrote:
> > drm_sched_job_cleanup()'s documentation claims that calling
> > drm_sched_job_arm() is a "point of no return", implying that
> > afterwards
> > a job cannot be cancelled anymore.
> > 
> > This is not correct, as proven by the function's code itself, which
> > takes a previous call to drm_sched_job_arm() into account. In
> > truth, the
> > decisive factors are whether fences have been shared (e.g., with
> > other
> > processes) and if the job has been submitted to an entity already.
> > 
> > Correct the wrong docstring.
> > 
> > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++----
> >   1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index c634993f1346..db9e08e6e455 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1013,11 +1013,13 @@
> > EXPORT_SYMBOL(drm_sched_job_has_dependency);
> >    * Cleans up the resources allocated with drm_sched_job_init().
> >    *
> >    * Drivers should call this from their error unwind code if @job
> > is aborted
> > - * before drm_sched_job_arm() is called.
> > + * before it was submitted to an entity with
> > drm_sched_entity_push_job().
> >    *
> > - * After that point of no return @job is committed to be executed
> > by the
> > - * scheduler, and this function should be called from the
> > - * &drm_sched_backend_ops.free_job callback.
> > + * Since calling drm_sched_job_arm() causes the job's fences to be
> > initialized,
> > + * it is up to the driver to ensure that fences that were exposed
> > to external
> > + * parties get signaled. drm_sched_job_cleanup() does not ensure
> > this.
> > + *
> > + * This function must also be called in &struct
> > drm_sched_backend_ops.free_job
> >    */
> >   void drm_sched_job_cleanup(struct drm_sched_job *job)
> >   {
> 
> I also do not see anything so special in the flows which would
> disallow 
> "aborting" jobs after arming. And it definitely can happen in the 
> practice. It was probably just unclear kerneldoc.

I thought about why they wrote that for a few months and the only
reason I can imagine was that they were worried about the initialized
fences being shared.

> 
> What I would suggest to add to the patch is to change the comment in 
> drm_sched_job_cleanup() implementation:
> 
> This one:
> 
> 		/* aborted job before committing to run it */
> 
> Probably just change to:
> 
> 		/* aborted before arming */

Yes, good catch. Will do.

P.

> 
> With that you can add my r-b.
> 
> Regards,
> 
> Tvrtko
>