linux-kernel - Re: [PATCH] drm/msm/a6xx: Fix recovery vs runpm race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <k5yte4b36j23p3pfqmflhpkfesyiwoczi3avs2vavvzb3i2oct@zhssj7u5a6og>
Date: Sat, 23 Dec 2023 01:27:47 +0530
From: Akhil P Oommen <quic_akhilpo@...cinc.com>
To: Rob Clark <robdclark@...il.com>
CC: <dri-devel@...ts.freedesktop.org>, <freedreno@...ts.freedesktop.org>,
        <linux-arm-msm@...r.kernel.org>, Rob Clark <robdclark@...omium.org>,
        "David
 Heidelberg" <david.heidelberg@...labora.com>,
        Abhinav Kumar
	<quic_abhinavk@...cinc.com>,
        Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
        Sean Paul <sean@...rly.run>,
        Marijn Suijten <marijn.suijten@...ainline.org>,
        David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
        "Konrad
 Dybcio" <konrad.dybcio@...aro.org>,
        Danylo Piliaiev <dpiliaiev@...lia.com>,
        Bjorn Andersson <andersson@...nel.org>,
        open list
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] drm/msm/a6xx: Fix recovery vs runpm race

On Mon, Dec 18, 2023 at 07:59:24AM -0800, Rob Clark wrote:
> 
> From: Rob Clark <robdclark@...omium.org>
> 
> a6xx_recover() is relying on the gpu lock to serialize against incoming
> submits doing a runpm get, as it tries to temporarily balance out the
> runpm gets with puts in order to power off the GPU.  Unfortunately this
> gets worse when we (in a later patch) will move the runpm get out of the
> scheduler thread/work to move it out of the fence signaling path.
> 
> Instead we can just simplify the whole thing by using force_suspend() /
> force_resume() instead of trying to be clever.

At some places, we take a pm_runtime vote and access the gpu
registers assuming it will be powered until we drop the vote.  a6xx_get_timestamp()
is an example. If we do a force suspend, it may cause bus errors from
those threads. Now you have to serialize every place we do runtime_get/put with a
mutex. Or is there a better way to handle the 'later patch' you
mentioned?

-Akhil.

> 
> Reported-by: David Heidelberg <david.heidelberg@...labora.com>
> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10272
> Fixes: abe2023b4cea ("drm/msm/gpu: Push gpu lock down past runpm")
> Signed-off-by: Rob Clark <robdclark@...omium.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 268737e59131..a5660d63535b 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1244,12 +1244,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
>  	dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
>  	dev_pm_genpd_synced_poweroff(gmu->cxpd);
>  
> -	/* Drop the rpm refcount from active submits */
> -	if (active_submits)
> -		pm_runtime_put(&gpu->pdev->dev);
> -
> -	/* And the final one from recover worker */
> -	pm_runtime_put_sync(&gpu->pdev->dev);
> +	pm_runtime_force_suspend(&gpu->pdev->dev);
>  
>  	if (!wait_for_completion_timeout(&gmu->pd_gate, msecs_to_jiffies(1000)))
>  		DRM_DEV_ERROR(&gpu->pdev->dev, "cx gdsc didn't collapse\n");
> @@ -1258,10 +1253,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
>  
>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>  
> -	if (active_submits)
> -		pm_runtime_get(&gpu->pdev->dev);
> -
> -	pm_runtime_get_sync(&gpu->pdev->dev);
> +	pm_runtime_force_resume(&gpu->pdev->dev);
>  
>  	gpu->active_submits = active_submits;
>  	mutex_unlock(&gpu->active_lock);
> -- 
> 2.43.0
>