linux-kernel - Re: [PATCH RFC 10/18] drm/scheduler: Add can_run

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cd788ccf-0cf1-85d5-1bf8-efc259bd7e11@amd.com>
Date:   Wed, 8 Mar 2023 09:46:38 +0100
From:   Christian König <christian.koenig@....com>
To:     Asahi Lina <lina@...hilina.net>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Miguel Ojeda <ojeda@...nel.org>,
        Alex Gaynor <alex.gaynor@...il.com>,
        Wedson Almeida Filho <wedsonaf@...il.com>,
        Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
        Björn Roy Baron <bjorn3_gh@...tonmail.com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Luben Tuikov <luben.tuikov@....com>,
        Jarkko Sakkinen <jarkko@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     Alyssa Rosenzweig <alyssa@...enzweig.io>,
        Karol Herbst <kherbst@...hat.com>,
        Ella Stanforth <ella@...unix.org>,
        Faith Ekstrand <faith.ekstrand@...labora.com>,
        Mary <mary@...y.zone>, linux-kernel@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, rust-for-linux@...r.kernel.org,
        linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
        linux-sgx@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback

Am 07.03.23 um 15:25 schrieb Asahi Lina:
> Some hardware may require more complex resource utilization accounting
> than the simple job count supported by drm_sched internally. Add a
> can_run_job callback to allow drivers to implement more logic before
> deciding whether to run a GPU job.

Well complete NAK.

This is clearly going against the idea of having jobs only depend on 
fences and nothing else which is mandatory for correct memory management.

If the hw is busy with something you need to return the fence for this 
from the prepare_job callback so that the scheduler can be notified when 
the hw is available again.

Regards,
Christian.

>
> Signed-off-by: Asahi Lina <lina@...hilina.net>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++++++
>   include/drm/gpu_scheduler.h            |  8 ++++++++
>   2 files changed, 18 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 4e6ad6e122bc..5c0add2c7546 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1001,6 +1001,16 @@ static int drm_sched_main(void *param)
>   		if (!entity)
>   			continue;
>   
> +		if (sched->ops->can_run_job) {
> +			sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> +			if (!sched_job) {
> +				complete_all(&entity->entity_idle);
> +				continue;
> +			}
> +			if (!sched->ops->can_run_job(sched_job))
> +				continue;
> +		}
> +
>   		sched_job = drm_sched_entity_pop_job(entity);
>   
>   		if (!sched_job) {
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 9db9e5e504ee..bd89ea9507b9 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -396,6 +396,14 @@ struct drm_sched_backend_ops {
>   	struct dma_fence *(*prepare_job)(struct drm_sched_job *sched_job,
>   					 struct drm_sched_entity *s_entity);
>   
> +	/**
> +	 * @can_run_job: Called before job execution to check whether the
> +	 * hardware is free enough to run the job.  This can be used to
> +	 * implement more complex hardware resource policies than the
> +	 * hw_submission limit.
> +	 */
> +	bool (*can_run_job)(struct drm_sched_job *sched_job);
> +
>   	/**
>            * @run_job: Called to execute the job once all of the dependencies
>            * have been resolved.  This may be called multiple times, if
>