[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc0961f3-5608-eab9-0937-462d498bd6e6@linaro.org>
Date: Wed, 29 Mar 2023 19:31:22 +0200
From: Konrad Dybcio <konrad.dybcio@...aro.org>
To: Rob Clark <robdclark@...il.com>
Cc: Johan Hovold <johan@...nel.org>, linux-arm-msm@...r.kernel.org,
andersson@...nel.org, agross@...nel.org,
marijn.suijten@...ainline.org,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Sean Paul <sean@...rly.run>, David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>,
Johan Hovold <johan+linaro@...nel.org>,
Akhil P Oommen <quic_akhilpo@...cinc.com>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
Nathan Chancellor <nathan@...nel.org>,
dri-devel@...ts.freedesktop.org, freedreno@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/msm/adreno: adreno_gpu: Use suspend() instead of
idle() on load error
On 29.03.2023 19:30, Rob Clark wrote:
> On Wed, Mar 29, 2023 at 8:48 AM Konrad Dybcio <konrad.dybcio@...aro.org> wrote:
>>
>>
>>
>> On 29.03.2023 16:37, Johan Hovold wrote:
>>> On Wed, Mar 29, 2023 at 04:04:44PM +0200, Konrad Dybcio wrote:
>>>> If we fail to initialize the GPU for whatever reason (say we don't
>>>> embed the GPU firmware files in the initrd), the error path involves
>>>> pm_runtime_put_sync() which then calls idle() instead of suspend().
>>>>
>>>> This is suboptimal, as it means that we're not going through the
>>>> clean shutdown sequence. With at least A619_holi, this makes the GPU
>>>> not wake up until it goes through at least one more start-fail-stop
>>>> cycle. Fix that by using pm_runtime_put_sync_suspend to force a clean
>>>> shutdown.
>>>
>>> This does not sound right. If pm_runtime_put_sync() fails to suspend the
>>> device when the usage count drops to zero, then you have a bug somewhere
>>> else.
>> I was surprised to see that it was not called as well, but I wasn't able
>> to track it down before..
>>
>>>
>>> Also since commit 2c087a336676 ("drm/msm/adreno: Load the firmware
>>> before bringing up the hardware") the firmware is loaded before even
>>> hitting these paths so the above description does not sound right in
>>> that respect either (or is missing some details).
>> ..but I did some more digging and I found that the precise "firmware"
>> that fails is the ZAP blob, which is not checked like SQE in the
>> commit you mentioned!
>>
>> Now I don't think that we can easily check for it as-is since
>> zap_shader_load_mdt() does the entire find-load-authenticate
>> dance which is required with secure assets, but it's obviously
>> possible to rip out the find-load part of that and go on from
>> there.
>>
>> Do you think that would be a better solution?
>
> Hmm, to hit this it sounds like you'd need all the fw _except_ the zap
> in the initrd?
Correct.
Konrad
>
> BR,
> -R
>
>> Konrad
>>
>>>
>>>> Test cases:
>>>> 1. firmware baked into kernel
>>>> 2. error loading fw in initrd -> load from rootfs at DE start
>>>>
>>>> Both succeed on A619_holi (SM6375) and A630 (SDM845).
>>>>
>>>> Fixes: 0d997f95b70f ("drm/msm/adreno: fix runtime PM imbalance at gpu load")
>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@...aro.org>
>>>> ---
>>>> drivers/gpu/drm/msm/adreno/adreno_device.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>>> index f61896629be6..59f3302e8167 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>>> @@ -477,7 +477,7 @@ struct msm_gpu *adreno_load_gpu(struct drm_device *dev)
>>>> return gpu;
>>>>
>>>> err_put_rpm:
>>>> - pm_runtime_put_sync(&pdev->dev);
>>>> + pm_runtime_put_sync_suspend(&pdev->dev);
>>>> err_disable_rpm:
>>>> pm_runtime_disable(&pdev->dev);
>>>
>>> Johan
Powered by blists - more mailing lists