lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1457e985f88e02cd04b2152d2468a65b7a513e63.camel@mailbox.org>
Date: Thu, 20 Feb 2025 16:28:57 +0100
From: Philipp Stanner <phasta@...lbox.org>
To: Maíra Canal <mcanal@...lia.com>, Philipp Stanner
 <phasta@...nel.org>, Matthew Brost <matthew.brost@...el.com>, Danilo
 Krummrich <dakr@...nel.org>, Christian König
 <ckoenig.leichtzumerken@...il.com>, Maarten Lankhorst
 <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>, 
 Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>, Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 2/3] drm/sched: Adjust outdated docu for run_job()

On Thu, 2025-02-20 at 10:28 -0300, Maíra Canal wrote:
> Hi Philipp,
> 
> On 20/02/25 08:28, Philipp Stanner wrote:
> > The documentation for drm_sched_backend_ops.run_job() mentions a
> > certain
> > function called drm_sched_job_recovery(). This function does not
> > exist.
> > What's actually meant is drm_sched_resubmit_jobs(), which is by now
> > also
> > deprecated.
> > 
> > Remove the mention of the removed function.
> > 
> > Discourage the behavior of drm_sched_backend_ops.run_job() being
> > called
> > multiple times for the same job.
> 
> It looks odd to me that this patch removes lines that were added in
> patch 1/3. Maybe you could change the patchset order and place this
> one
> as the first.
> 
> > 
> > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > ---
> >   include/drm/gpu_scheduler.h | 19 +++++++++++++------
> >   1 file changed, 13 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/drm/gpu_scheduler.h
> > b/include/drm/gpu_scheduler.h
> > index 916279b5aa00..29e5bda91806 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -421,20 +421,27 @@ struct drm_sched_backend_ops {
> >   
> >   	/**
> >   	 * @run_job: Called to execute the job once all of the
> > dependencies
> > -	 * have been resolved. This may be called multiple times,
> > if
> > -	 * timedout_job() has happened and
> > drm_sched_job_recovery() decides to
> > -	 * try it again.
> > +	 * have been resolved.
> > +	 *
> > +	 * The deprecated drm_sched_resubmit_jobs() (called from
> > +	 * drm_sched_backend_ops.timedout_job()) can invoke this
> > again with the
> 
> I think it would be "@timedout_job".

Not sure, isn't referencing in docstrings done with '&'?

> 
> > +	 * same parameters. Using this is discouraged because it,
> > presumably,
> > +	 * violates dma_fence rules.
> 
> I believe it would be "struct dma_fence".

Well, in this case strictly speaking not IMO, because it's about the
rules of the "DMA Fence Subsystem", not about the struct itself.

I'd just keep it that way or call it "dma fence"

> 
> > +	 *
> > +	 * TODO: Document which fence rules above.
> >   	 *
> >   	 * @sched_job: the job to run
> >   	 *
> > -	 * Returns: dma_fence the driver must signal once the
> > hardware has
> > -	 *	completed the job ("hardware fence").
> > -	 *
> >   	 * Note that the scheduler expects to 'inherit' its own
> > reference to
> >   	 * this fence from the callback. It does not invoke an
> > extra
> >   	 * dma_fence_get() on it. Consequently, this callback must
> > take a
> >   	 * reference for the scheduler, and additional ones for
> > the driver's
> >   	 * respective needs.
> 
> Would it be possible to add a comment that `run_job()` must check if
> `s_fence->finished.error` is different than 0? If you increase the
> karma
> of a job and don't check for `s_fence->finished.error`, you might run
> a
> cancelled job.

s_fence->finished is only signaled and its error set once the hardware
fence got signaled; or when the entity is killed.

In any case, signaling "finished" will cause the job to be prevented
from being executed (again), and will never reach run_job() in the
first place.

Correct me if I am mistaken.

Or are you suggesting that there is a race?


P.

> 
> > +	 *
> > +	 * Return:
> > +	 * * On success: dma_fence the driver must signal once the
> > hardware has
> > +	 * completed the job ("hardware fence").
> 
> A suggestion: "the fence that the driver must signal once the
> hardware
> has completed the job".
> 
> Best Regards,
> - Maíra
> 
> > +	 * * On failure: NULL or an ERR_PTR.
> >   	 */
> >   	struct dma_fence *(*run_job)(struct drm_sched_job
> > *sched_job);
> >   
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ