linux-kernel - Re: [PATCH] drm/msm: Disable frequency clamping on a630

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53d3e5b7-9dc0-a806-70e9-b9b5ff877462@codeaurora.org>
Date:   Mon, 9 Aug 2021 22:56:40 +0530
From:   Akhil P Oommen <akhilpo@...eaurora.org>
To:     Caleb Connolly <caleb.connolly@...aro.org>,
        Rob Clark <robdclark@...il.com>
Cc:     dri-devel <dri-devel@...ts.freedesktop.org>,
        freedreno <freedreno@...ts.freedesktop.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Rob Clark <robdclark@...omium.org>,
        Sean Paul <sean@...rly.run>, David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        Jordan Crouse <jordan@...micpenguin.net>,
        Jonathan Marek <jonathan@...ek.ca>,
        Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Sharat Masetty <smasetty@...eaurora.org>,
        open list <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...nel.org>
Subject: Re: [PATCH] drm/msm: Disable frequency clamping on a630

On 8/9/2021 9:48 PM, Caleb Connolly wrote:
> 
> 
> On 09/08/2021 17:12, Rob Clark wrote:
>> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen <akhilpo@...eaurora.org> 
>> wrote:
>>>
>>> On 8/8/2021 10:22 PM, Rob Clark wrote:
>>>> On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly 
>>>> <caleb.connolly@...aro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 07/08/2021 21:04, Rob Clark wrote:
>>>>>> On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
>>>>>> <caleb.connolly@...aro.org> wrote:
>>>>>>>
>>>>>>> Hi Rob, Akhil,
>>>>>>>
>>>>>>> On 29/07/2021 21:53, Rob Clark wrote:
>>>>>>>> On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
>>>>>>>> <caleb.connolly@...aro.org> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 29/07/2021 21:24, Rob Clark wrote:
>>>>>>>>>> On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
>>>>>>>>>> <caleb.connolly@...aro.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Rob,
>>>>>>>>>>>
>>>>>>>>>>> I've done some more testing! It looks like before that patch 
>>>>>>>>>>> ("drm/msm: Devfreq tuning") the GPU would never get above
>>>>>>>>>>> the second frequency in the OPP table (342MHz) (at least, not 
>>>>>>>>>>> in glxgears). With the patch applied it would more
>>>>>>>>>>> aggressively jump up to the max frequency which seems to be 
>>>>>>>>>>> unstable at the default regulator voltages.
>>>>>>>>>>
>>>>>>>>>> *ohh*, yeah, ok, that would explain it
>>>>>>>>>>
>>>>>>>>>>> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up 
>>>>>>>>>>> to 0.988v (instead of the stock 0.516v) makes the GPU stable
>>>>>>>>>>> at the higher frequencies.
>>>>>>>>>>>
>>>>>>>>>>> Applying this patch reverts the behaviour, and the GPU never 
>>>>>>>>>>> goes above 342MHz in glxgears, losing ~30% performance in
>>>>>>>>>>> glxgear.
>>>>>>>>>>>
>>>>>>>>>>> I think (?) that enabling CPR support would be the proper 
>>>>>>>>>>> solution to this - that would ensure that the regulators run
>>>>>>>>>>> at the voltage the hardware needs to be stable.
>>>>>>>>>>>
>>>>>>>>>>> Is hacking the voltage higher (although ideally not quite 
>>>>>>>>>>> that high) an acceptable short term solution until we have
>>>>>>>>>>> CPR? Or would it be safer to just not make use of the higher 
>>>>>>>>>>> frequencies on a630 for now?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is 
>>>>>>>>>> already
>>>>>>>>>> on CC and I added sboyd, maybe one of them knows better.
>>>>>>>>>>
>>>>>>>>>> In the short term, removing the higher problematic OPPs from 
>>>>>>>>>> dts might
>>>>>>>>>> be a better option than this patch (which I'm dropping), since 
>>>>>>>>>> there
>>>>>>>>>> is nothing stopping other workloads from hitting higher OPPs.
>>>>>>>>> Oh yeah that sounds like a more sensible workaround than mine .
>>>>>>>>>>
>>>>>>>>>> I'm slightly curious why I didn't have problems at higher OPPs 
>>>>>>>>>> on my
>>>>>>>>>> c630 laptop (sdm850)
>>>>>>>>> Perhaps you won the sillicon lottery - iirc sdm850 is binned 
>>>>>>>>> for higher clocks as is out of the factory.
>>>>>>>>>
>>>>>>>>> Would it be best to drop the OPPs for all devices? Or just 
>>>>>>>>> those affected? I guess it's possible another c630 might
>>>>>>>>> crash where yours doesn't?
>>>>>>>>
>>>>>>>> I've not heard any reports of similar issues from the handful of 
>>>>>>>> other
>>>>>>>> folks with c630's on #aarch64-laptops.. but I can't really say 
>>>>>>>> if that
>>>>>>>> is luck or not.
>>>>>>> It looks like this affects at least the OnePlus 6 and PocoPhone 
>>>>>>> F1, I've done some more poking and the following diff
>>>>>>> seems to fix the stability issues completely, it seems the delay 
>>>>>>> is required to let the update propagate.
>>>>>>>
>>>>>>> This doesn't feel like the right fix, but hopefully it's enough 
>>>>>>> to come up with a better solution than disabling the new
>>>>>>> devfreq behaviour on a630.
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
>>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>> index d7cec7f0dde0..69e2a5e84dae 100644
>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>> @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, 
>>>>>>> struct dev_pm_opp *opp)
>>>>>>>                     return;
>>>>>>>             }
>>>>>>>
>>>>>>> +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>>>>>>> +
>>>>>>> +       usleep_range(300, 500);
>>>>>>> +
>>>>>>
>>>
>>> I am a bit confused. We don't define a power domain for gpu in dt,
>>> correct? Then what exactly set_opp do here? Do you think this usleep is
>>> what is helping here somehow to mask the issue?
> The power domains (for cx and gx) are defined in the GMU DT, the OPPs in 
> the GPU DT. For the sake of simplicity I'll refer to the lowest 
> frequency (257000000) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as 
> the "min" state, and the highest frequency (710000000) and OPP level 
> (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in 
> sdm845.dtsi under the gpu node.
> 
> The new devfreq behaviour unmasks what I think is a driver bug, it 
> inadvertently puts much more strain on the GPU regulators than they 
> usually get. With the new behaviour the GPU jumps from it's min state to 
> the max state and back again extremely rapidly under workloads as small 
> as refreshing UI. Where previously the GPU would rarely if ever go above 
> 342MHz when interacting with the device, it now jumps between min and 
> max many times per second.
> 
> If my understanding is correct, the current implementation of the GMU 
> set freq is the following:
>   - Get OPP for frequency to set
>   - Push the frequency to the GMU - immediately updating the core clock
>   - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds 
> up somewhere in power management code and causes the gx regulator level 
> to be updated

Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. 
We were using a different api earlier which got deprecated - 
dev_pm_opp_set_bw().

> 
> The regulator will then take some time to reach it's new voltage level 
> and stabilise. I believe that rapid transitions between min and max 
> state - in combination with the increased current load from the GPU core 
> - lead to the regulator becoming unstable (e.g. when it's requested to 
> transition from it's lowest to highest levels immediately after 
> transitioning down), the unstable voltage causes the GPU to crash.
> 
> Sillicon lottery will of course play a role here - this is very much an 
> edge case and would definitely be different on a per-device and even 
> per-unit basis.
>>
>> Hmm, I thought "opp-level = RPMH_REGULATOR_LEVEL_*" did *something*,
>> but tbh I'm not sure exactly what..
>>
>>> I feel we should just leave the new dcvs feature (shall we call it NAP?)
>>> disabled for a630 (and 10ms devfreq interval), until this is root 
>>> caused.
> I believe this hacky workaround expresses the root cause of the issue 
> quite clearly, by setting the OPP first and allowing the gx regulator to 
> become stable before telling the GPU to change clock speeds, we avoid 
> the edge case and prevent the crashes.
> 
> I took some rough measurements by adding logging to msm_devfreq_idle and 
> causing UI updates for ~20 seconds and that function is being called 
> about 30 times per second, this means the GPU is transitioning between 
> min (idle) state and max (active / boost) state at that frequency and 
> causing the issue I described above. It's likely that the usleep is 
> helping to mask this behaviour.
> 
> I hope this serves as a slightly better explanation of what I perceive 
> to be the issue, I realise my previous explanations were not very 
> adequate, I apologise for all the noise.
>>
>> I suppose "NAP" is a reasonable name.
>>
>> But I think that reverting to previous behavior would not be enough,
>> there is nothing stopping devfreq from jumping from min to max freq,
>> which AFAIU should be enough to trigger this.  I guess that there just
>> hasn't been enough testing with different game workloads on those
>> phones to trigger this.
> Ack
>>
>> That said, I haven't seen similar issues on my sdm850 laptop, where I
>> defn have triggered mix->max freq transitions.. I guess it would be
>> interesting to know if this issue could be reproduced on db845c, or if
>> it really is board specific?
> My db845c arrives this week, I'll definitely try and reproduce this.
>>
>> To workaround, I think we'd need to implement some way to limit that
>> maximum frequency jump (and then use delayed work to continue ramping
>> up the freq over time until we hit the target).. which seems like a
>> lot of work if this is just a board(s) specific workaround and isn't
>> needed once CPR is supported
> Based on my reasoning above, I came up with the following: reducing 
> thrashing by preventing rapid idle/active transitions. The minimum 
> active time of 30ms was just used for testing, I think some number 
> between 2 and 4 frames would be a sensible choice - the higher the safer.
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index d7cec7f0dde0..87f2d1085c3e 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -139,6 +139,8 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
> dev_pm_opp *opp)
>                  return;
>          }
> 
> +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> +
>          gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0);
> 
>          gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING,
> @@ -158,7 +160,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
> dev_pm_opp *opp)
>          if (ret)
>                  dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", 
> ret);
> 
> -       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>          pm_runtime_put(gmu->dev);
>   }
> 
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 0e4b45bff2e6..0e2293bcb46d 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -99,8 +99,8 @@ struct msm_gpu_devfreq {
>          /** time: Time of last sampling period. */
>          ktime_t time;
> 
> -       /** idle_time: Time of last transition to idle: */
> -       ktime_t idle_time;
> +       /** transition_time: Time of last transition between 
> idle/active: */
> +       ktime_t transition_time;
> 
>          /**
>           * idle_freq:
> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c 
> b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> index 0a1ee20296a2..774a7be33e7a 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> @@ -157,7 +157,7 @@ void msm_devfreq_active(struct msm_gpu *gpu)
>           */
>          mutex_lock(&df->devfreq->lock);
> 
> -       idle_time = ktime_to_ms(ktime_sub(ktime_get(), df->idle_time));
> +       idle_time = ktime_to_ms(ktime_sub(ktime_get(), 
> df->transition_time));
> 
>          /*
>           * If we've been idle for a significant fraction of a polling
> @@ -168,7 +168,7 @@ void msm_devfreq_active(struct msm_gpu *gpu)
>                  target_freq *= 2;
>          }
> 
> -       df->idle_freq = 0;
> +       df->transition_time = ktime_get();;
> 
>          msm_devfreq_target(&gpu->pdev->dev, &target_freq, 0);
> 
> @@ -185,6 +185,16 @@ void msm_devfreq_idle(struct msm_gpu *gpu)
>   {
>          struct msm_gpu_devfreq *df = &gpu->devfreq;
>          unsigned long idle_freq, target_freq = 0;
> +       unsigned int active_time;
> +
> +       active_time = ktime_to_ms(ktime_sub(ktime_get(), 
> df->transition_time));
> +       /*
> +        * Don't go back to idle unless we've been active for at least 30ms
> +        * to avoid thrashing.

This basically defeats the purpose of this feature! At least, we should 
keep this '30' gpu specific. Does a Kconfig makes sense here?? BTW, if 
300us was helping you earlier why do you want it to be 30ms now?

> +        */
> +       if (active_time < 30) {
> +               return;
> +       }
> 
>          /*
>           * Hold devfreq lock to synchronize with get_dev_status()/
> @@ -196,7 +206,7 @@ void msm_devfreq_idle(struct msm_gpu *gpu)
> 
>          msm_devfreq_target(&gpu->pdev->dev, &target_freq, 0);
> 
> -       df->idle_time = ktime_get();
> +       df->transition_time = ktime_get();
>          df->idle_freq = idle_freq;
> 
>          mutex_unlock(&df->devfreq->lock);
>>
>> BR,
>> -R
>>
>>>>>> Hmm, this is going to be in the critical path on idle -> active
>>>>>> transition (ie. think response time to user-input).. so we defn don't
>>>>>> want to do this unconditionally..
>>>>>>
>>>>>> If I understand the problem, we just want to limit how far we jump 
>>>>>> the
>>>>>> gpu freq in one go.. maybe deleting the lowest (and perhaps highest)
>>>>>> OPP would accomplish that?  Could that be done in the board(s)'s
>>>>>> toplevel dts files?
>>>>> That would be a workaround, however I'd really like to avoid 
>>>>> limiting performance as a solution if I can help it,
>>>>> especially as the fix might just be "set the opp first, wait for it 
>>>>> to apply, then set the core clock".
>>>>>
>>>>> Is there a sensible way to get a callback from the opp notify 
>>>>> chain? Or from rpmh directly? Or is this solution really
>>>>> not the right way to go?
>>>>
>>>> It does seem a bit strange to me that we are telling GMU to change
>>>> freq before calling dev_pm_opp_set_opp()..  if dev_pm_opp_set_opp() is
>>>> increasing voltage, it seems like you'd want to do that *before*
>>>> increasing freq (but reverse the order when decreasing freq).. But I'm
>>>> not an expert on the ways of the GMU..  maybe Akhil or Jordan knows
>>>> better how this is supposed to work.
>>>
>>> For legacy gmu, we trigger DCVS using DCVS OOB which comes later in this
>>> function. But the order between regulator and clock which you mentioned
>>> is correct.
>>>
>>>>
>>>> But the delay seems like papering something over, and I'm trying to go
>>>> in the other direction and reduce latency between user input and
>>>> pageflip..
>>>>
>>>> BR,
>>>> -R
>>>>
>>>>>>
>>>>>> BR,
>>>>>> -R
>>>>>>
>>>>>>>             gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0);
>>>>>>>
>>>>>>>             gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING,
>>>>>>> @@ -158,7 +162,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, 
>>>>>>> struct dev_pm_opp *opp)
>>>>>>>             if (ret)
>>>>>>>                     dev_err(gmu->dev, "GMU set GPU frequency 
>>>>>>> error: %d\n", ret);
>>>>>>>
>>>>>>> -       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>>>>>>>             pm_runtime_put(gmu->dev);
>>>>>>>      }
>>>>>>>>
>>>>>>>> Maybe just remove it for affected devices?  But I'll defer to 
>>>>>>>> Bjorn.
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> -R
>>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Kind Regards,
>>>>>>> Caleb (they/them)
>>>>>
>>>>> -- 
>>>>> Kind Regards,
>>>>> Caleb (they/them)
>>>
>