[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231218155927.368881-1-robdclark@gmail.com>
Date: Mon, 18 Dec 2023 07:59:24 -0800
From: Rob Clark <robdclark@...il.com>
To: dri-devel@...ts.freedesktop.org
Cc: freedreno@...ts.freedesktop.org,
linux-arm-msm@...r.kernel.org,
Rob Clark <robdclark@...omium.org>,
David Heidelberg <david.heidelberg@...labora.com>,
Rob Clark <robdclark@...il.com>,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Sean Paul <sean@...rly.run>,
Marijn Suijten <marijn.suijten@...ainline.org>,
David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>,
Konrad Dybcio <konrad.dybcio@...aro.org>,
Akhil P Oommen <quic_akhilpo@...cinc.com>,
Danylo Piliaiev <dpiliaiev@...lia.com>,
Bjorn Andersson <andersson@...nel.org>,
linux-kernel@...r.kernel.org (open list)
Subject: [PATCH] drm/msm/a6xx: Fix recovery vs runpm race
From: Rob Clark <robdclark@...omium.org>
a6xx_recover() is relying on the gpu lock to serialize against incoming
submits doing a runpm get, as it tries to temporarily balance out the
runpm gets with puts in order to power off the GPU. Unfortunately this
gets worse when we (in a later patch) will move the runpm get out of the
scheduler thread/work to move it out of the fence signaling path.
Instead we can just simplify the whole thing by using force_suspend() /
force_resume() instead of trying to be clever.
Reported-by: David Heidelberg <david.heidelberg@...labora.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10272
Fixes: abe2023b4cea ("drm/msm/gpu: Push gpu lock down past runpm")
Signed-off-by: Rob Clark <robdclark@...omium.org>
---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 268737e59131..a5660d63535b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1244,12 +1244,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
dev_pm_genpd_synced_poweroff(gmu->cxpd);
- /* Drop the rpm refcount from active submits */
- if (active_submits)
- pm_runtime_put(&gpu->pdev->dev);
-
- /* And the final one from recover worker */
- pm_runtime_put_sync(&gpu->pdev->dev);
+ pm_runtime_force_suspend(&gpu->pdev->dev);
if (!wait_for_completion_timeout(&gmu->pd_gate, msecs_to_jiffies(1000)))
DRM_DEV_ERROR(&gpu->pdev->dev, "cx gdsc didn't collapse\n");
@@ -1258,10 +1253,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
pm_runtime_use_autosuspend(&gpu->pdev->dev);
- if (active_submits)
- pm_runtime_get(&gpu->pdev->dev);
-
- pm_runtime_get_sync(&gpu->pdev->dev);
+ pm_runtime_force_resume(&gpu->pdev->dev);
gpu->active_submits = active_submits;
mutex_unlock(&gpu->active_lock);
--
2.43.0
Powered by blists - more mailing lists