linux-kernel - Re: [PATCH v5 2/3] drm/sched: Adjust outdated docu for run

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <12c53d41-21c4-443d-a572-fd22c3cc56ad@igalia.com>
Date: Thu, 20 Feb 2025 10:28:53 -0300
From: Maíra Canal <mcanal@...lia.com>
To: Philipp Stanner <phasta@...nel.org>,
 Matthew Brost <matthew.brost@...el.com>, Danilo Krummrich <dakr@...nel.org>,
 Christian König <ckoenig.leichtzumerken@...il.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 2/3] drm/sched: Adjust outdated docu for run_job()

Hi Philipp,

On 20/02/25 08:28, Philipp Stanner wrote:
> The documentation for drm_sched_backend_ops.run_job() mentions a certain
> function called drm_sched_job_recovery(). This function does not exist.
> What's actually meant is drm_sched_resubmit_jobs(), which is by now also
> deprecated.
> 
> Remove the mention of the removed function.
> 
> Discourage the behavior of drm_sched_backend_ops.run_job() being called
> multiple times for the same job.

It looks odd to me that this patch removes lines that were added in
patch 1/3. Maybe you could change the patchset order and place this one
as the first.

> 
> Signed-off-by: Philipp Stanner <phasta@...nel.org>
> ---
>   include/drm/gpu_scheduler.h | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 916279b5aa00..29e5bda91806 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -421,20 +421,27 @@ struct drm_sched_backend_ops {
>   
>   	/**
>   	 * @run_job: Called to execute the job once all of the dependencies
> -	 * have been resolved. This may be called multiple times, if
> -	 * timedout_job() has happened and drm_sched_job_recovery() decides to
> -	 * try it again.
> +	 * have been resolved.
> +	 *
> +	 * The deprecated drm_sched_resubmit_jobs() (called from
> +	 * drm_sched_backend_ops.timedout_job()) can invoke this again with the

I think it would be "@timedout_job".

> +	 * same parameters. Using this is discouraged because it, presumably,
> +	 * violates dma_fence rules.

I believe it would be "struct dma_fence".

> +	 *
> +	 * TODO: Document which fence rules above.
>   	 *
>   	 * @sched_job: the job to run
>   	 *
> -	 * Returns: dma_fence the driver must signal once the hardware has
> -	 *	completed the job ("hardware fence").
> -	 *
>   	 * Note that the scheduler expects to 'inherit' its own reference to
>   	 * this fence from the callback. It does not invoke an extra
>   	 * dma_fence_get() on it. Consequently, this callback must take a
>   	 * reference for the scheduler, and additional ones for the driver's
>   	 * respective needs.

Would it be possible to add a comment that `run_job()` must check if
`s_fence->finished.error` is different than 0? If you increase the karma
of a job and don't check for `s_fence->finished.error`, you might run a
cancelled job.

> +	 *
> +	 * Return:
> +	 * * On success: dma_fence the driver must signal once the hardware has
> +	 * completed the job ("hardware fence").

A suggestion: "the fence that the driver must signal once the hardware
has completed the job".

Best Regards,
- Maíra

> +	 * * On failure: NULL or an ERR_PTR.
>   	 */
>   	struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
>