lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNq9/6Ks2N3A1hDu@lstrano-desk.jf.intel.com>
Date: Mon, 29 Sep 2025 10:12:31 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Philipp Stanner <phasta@...nel.org>
CC: Danilo Krummrich <dakr@...nel.org>, Christian König
	<ckoenig.leichtzumerken@...il.com>, <dri-devel@...ts.freedesktop.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] drm/sched: Mandate usage of drm_sched_job_cleanup()

On Fri, Sep 26, 2025 at 02:36:31PM +0200, Philipp Stanner wrote:
> drm_sched_job_cleanup()'s documentation so far uses relatively soft
> language, only "recommending" usage of the function. To avoid memory
> leaks and, potentiall, other bugs, however, the function has to be used.
> 
> Demand usage of the function explicitly.
> 
> Signed-off-by: Philipp Stanner <phasta@...nel.org>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 46119aacb809..0a9df9e61963 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1030,7 +1030,7 @@ EXPORT_SYMBOL(drm_sched_job_has_dependency);
>   *
>   * Cleans up the resources allocated with drm_sched_job_init().
>   *
> - * Drivers should call this from their error unwind code if @job is aborted
> + * Drivers need to call this from their error unwind code if @job is aborted
>   * before drm_sched_job_arm() is called.
>   *
>   * drm_sched_job_arm() is a point of no return since it initializes the fences
> @@ -1038,7 +1038,7 @@ EXPORT_SYMBOL(drm_sched_job_has_dependency);
>   * submit it with drm_sched_entity_push_job() and cannot simply abort it by
>   * calling drm_sched_job_cleanup().
>   *
> - * This function should be called in the &drm_sched_backend_ops.free_job callback.
> + * This function must be called in the &drm_sched_backend_ops.free_job callback.

I believe both the existing documentation and the new version are
incorrect.

Also, the free_job callback is probably misnamed. This ties into a
statement made during your XDC presentation today that confused me. The
DRM scheduler only holds a reference to the driver-allocated job from
the time run_job is called (i.e., once drm_sched_job_arm is invoked)
until the job’s fence signals. It does not own the responsibility of
freeing the job — the driver should manage the job’s reference count and
free it as appropriate.

For most drivers, the only reference is the creation reference, which is
transferred to the DRM scheduler after drm_sched_job_arm and released in
free_job.

So, I think what really needs to be documented is this reference
counting transfer. drm_sched_job_cleanup is actually associated with
drm_sched_job_init — meaning drm_sched_job_cleanup must be called if
drm_sched_job_init was called. This is typically handled by the driver
through reference counting of the job.

Finally, I think free_job should probably be renamed to something like
sched_job_put to better reflect its purpose.

We can discuss this more in person on Wednesday at the workshop or grab
in the hallway.

Matt

>   */
>  void drm_sched_job_cleanup(struct drm_sched_job *job)
>  {
> -- 
> 2.49.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ