[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGbVbsydzXyKuhN8VyW9zYwuOMWzvz192WKKReHVX1XCnuXGQ@mail.gmail.com>
Date: Thu, 18 Jan 2024 09:36:34 +0800
From: Qiang Yu <yuq825@...il.com>
To: Erico Nunes <nunes.erico@...il.com>
Cc: dri-devel@...ts.freedesktop.org, lima@...ts.freedesktop.org,
anarsoul@...il.com, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
Sumit Semwal <sumit.semwal@...aro.org>, christian.koenig@....com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/6] drm/lima: fix devfreq refcount imbalance for job timeouts
So this is caused by same job trigger both done and timeout handling?
I think a better way to solve this is to make sure only one handler
(done or timeout) process the job instead of just making lima_pm_idle()
unique.
Regards,
Qiang
On Wed, Jan 17, 2024 at 11:12 AM Erico Nunes <nunes.erico@...il.com> wrote:
>
> In case a task manages to complete but it took just long enough to also
> trigger the timeout handler, the current code results in a refcount
> imbalance on lima_pm_idle.
>
> While this can be a rare occurrence, when it happens it may fill user
> logs with stack traces such as:
>
> [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0
> ...
> [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0
> ...
> [10136.669628] Call trace:
> [10136.669634] lima_devfreq_record_idle+0xa0/0xb0
> [10136.669646] lima_sched_pipe_task_done+0x5c/0xb0
> [10136.669656] lima_gp_irq_handler+0xa8/0x120
> [10136.669666] __handle_irq_event_percpu+0x48/0x160
> [10136.669679] handle_irq_event+0x4c/0xc0
>
> The imbalance happens because lima_sched_pipe_task_done() already calls
> lima_pm_idle for this case if there was no error.
> Check the error flag in the timeout handler to ensure we can never run
> into this case.
>
> Signed-off-by: Erico Nunes <nunes.erico@...il.com>
> ---
> drivers/gpu/drm/lima/lima_sched.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index c3bf8cda8498..66317296d831 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -427,7 +427,8 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
> pipe->current_vm = NULL;
> pipe->current_task = NULL;
>
> - lima_pm_idle(ldev);
> + if (pipe->error)
> + lima_pm_idle(ldev);
>
> drm_sched_resubmit_jobs(&pipe->base);
> drm_sched_start(&pipe->base, true);
> --
> 2.43.0
>
Powered by blists - more mailing lists