[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <75fffbf97ec274eb7b2727c9506d7aaa49b7d24e.camel@redhat.com>
Date: Tue, 19 Nov 2024 14:48:41 +0100
From: Philipp Stanner <pstanner@...hat.com>
To: Luben Tuikov <ltuikov89@...il.com>, Matthew Brost
<matthew.brost@...el.com>, Danilo Krummrich <dakr@...nel.org>, Maarten
Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
<mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, Christian
König
<christian.koenig@....com>
Subject: Re: [PATCH 1/2] drm/sched: Fix drm_sched_entity_flush() return val
On Tue, 2024-11-19 at 14:41 +0100, Philipp Stanner wrote:
> The documentation of drm_sched_entity_flush() states that the
> function
> shall - always - return the remaining timeout time in jiffies.
>
> However, that is not what the function actually does; in one of its
> if
> branches it simply returns the unchanged timeout value.
>
> Furthermore, the used function wait_event_timeout() doesn't always
> return the remaining timeout time.
>
> Adjust the function so that it actually does what the documentation
> states it shall do.
>
> Cc: Christian König <christian.koenig@....com>
> Signed-off-by: Philipp Stanner <pstanner@...hat.com>
> ---
> @AMD:
> You guys are the only ones who use the function's return code, so I
> leave it completely up to you to decide what behavior you want.
>
> But we should at least do something, because right now function
> documentation and behavior do not match.
>
> P.
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 23 ++++++++++++++++------
> -
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index a75eede8bf8d..16b172aee453 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -278,7 +278,7 @@ static void drm_sched_entity_kill(struct
> drm_sched_entity *entity)
> * waiting, removes the entity from the runqueue and returns an
> error when the
> * process was killed.
> *
> - * Returns the remaining time in jiffies left from the input timeout
> + * Returns: 0 if the timeout ellapsed, the remaining time otherwise.
> */
> long drm_sched_entity_flush(struct drm_sched_entity *entity, long
> timeout)
> {
> @@ -294,15 +294,24 @@ long drm_sched_entity_flush(struct
> drm_sched_entity *entity, long timeout)
> * The client will not queue more IBs during this fini,
> consume existing
> * queued IBs or discard them on SIGKILL
> */
> - if (current->flags & PF_EXITING) {
> - if (timeout)
> - ret = wait_event_timeout(
> - sched->job_scheduled,
> -
> drm_sched_entity_is_idle(entity),
> - timeout);
> + if (timeout != 0 && (current->flags & PF_EXITING)) {
> + ret = wait_event_timeout(sched->job_scheduled,
> + drm_sched_entity_is_idle(entity),
> + timeout);
> + /*
> + * wait_event_timeout() returns 1 if it timed out
> but the
> + * condition became true on timeout. We only care
> about whether
> + * it timed out or not.
> + */
> + if (ret == 1)
> + ret = 0;
> } else {
> wait_event_killable(sched->job_scheduled,
>
> drm_sched_entity_is_idle(entity));
> +
> + ret -= (long)get_jiffies_64();
Ah, just recognized that this is probably nonsense – anyways, let's
discuss what we shall do here. I can fix it later.
P.
> + if (ret < 0)
> + ret = 0;
> }
>
> /* For killed process disable any more IBs enqueue right now
> */
Powered by blists - more mailing lists