linux-kernel - Re: [PATCH v7 1/3] drm/sched: Adjust outdated docu for run

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7502a793-4e92-4bfc-9c87-66bd7fdd88ce@igalia.com>
Date: Fri, 7 Mar 2025 15:09:50 -0300
From: Maíra Canal <mcanal@...lia.com>
To: Philipp Stanner <phasta@...nel.org>,
 Matthew Brost <matthew.brost@...el.com>, Danilo Krummrich <dakr@...nel.org>,
 Christian König <ckoenig.leichtzumerken@...il.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Sumit Semwal <sumit.semwal@...aro.org>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 1/3] drm/sched: Adjust outdated docu for run_job()

Hi Philipp,

On 05/03/25 10:05, Philipp Stanner wrote:
> The documentation for drm_sched_backend_ops.run_job() mentions a certain
> function called drm_sched_job_recovery(). This function does not exist.
> What's actually meant is drm_sched_resubmit_jobs(), which is by now also
> deprecated.
> 
> Furthermore, the scheduler expects to "inherit" a reference on the fence
> from the run_job() callback. This, so far, is also not documented.
> 
> Remove the mention of the removed function.
> 
> Discourage the behavior of drm_sched_backend_ops.run_job() being called
> multiple times for the same job.
> 
> Document the necessity of incrementing the refcount in run_job().
> 
> Signed-off-by: Philipp Stanner <phasta@...nel.org>
> ---
>   include/drm/gpu_scheduler.h | 34 ++++++++++++++++++++++++++++++----
>   1 file changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 50928a7ae98e..6381baae8024 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -410,10 +410,36 @@ struct drm_sched_backend_ops {
>   					 struct drm_sched_entity *s_entity);
>   
>   	/**
> -         * @run_job: Called to execute the job once all of the dependencies
> -         * have been resolved.  This may be called multiple times, if
> -	 * timedout_job() has happened and drm_sched_job_recovery()
> -	 * decides to try it again.
> +	 * @run_job: Called to execute the job once all of the dependencies
> +	 * have been resolved.
> +	 *
> +	 * @sched_job: the job to run
> +	 *
> +	 * The deprecated drm_sched_resubmit_jobs() (called by &struct
> +	 * drm_sched_backend_ops.timedout_job) can invoke this again with the
> +	 * same parameters. Using this is discouraged because it violates
> +	 * dma_fence rules, notably dma_fence_init() has to be called on
> +	 * already initialized fences for a second time. Moreover, this is
> +	 * dangerous because attempts to allocate memory might deadlock with
> +	 * memory management code waiting for the reset to complete.

Thanks for adding this paragraph! Also, thanks Christian for providing
this explanation in v5. It really helped clarify the reasoning behind
deprecating drm_sched_resubmit_jobs().

Best Regards,
- Maíra

> +	 *
> +	 * TODO: Document what drivers should do / use instead.
> +	 *
> +	 * This method is called in a workqueue context - either from the
> +	 * submit_wq the driver passed through drm_sched_init(), or, if the
> +	 * driver passed NULL, a separate, ordered workqueue the scheduler
> +	 * allocated.
> +	 *
> +	 * Note that the scheduler expects to 'inherit' its own reference to
> +	 * this fence from the callback. It does not invoke an extra
> +	 * dma_fence_get() on it. Consequently, this callback must take a
> +	 * reference for the scheduler, and additional ones for the driver's
> +	 * respective needs.
> +	 *
> +	 * Return:
> +	 * * On success: dma_fence the driver must signal once the hardware has
> +	 * completed the job ("hardware fence").
> +	 * * On failure: NULL or an ERR_PTR.
>   	 */
>   	struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
>