linux-kernel - Re: [PATCH] drm/msm: Disable frequency clamping on a630

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YTeIrRwAkItd1xQE@ripper>
Date:   Tue, 7 Sep 2021 08:43:41 -0700
From:   Bjorn Andersson <bjorn.andersson@...aro.org>
To:     Rob Clark <robdclark@...il.com>
Cc:     Caleb Connolly <caleb.connolly@...aro.org>,
        Rob Clark <robdclark@...omium.org>,
        Akhil P Oommen <akhilpo@...eaurora.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        freedreno <freedreno@...ts.freedesktop.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Sean Paul <sean@...rly.run>, David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        Jordan Crouse <jordan@...micpenguin.net>,
        Jonathan Marek <jonathan@...ek.ca>,
        Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>,
        Sharat Masetty <smasetty@...eaurora.org>,
        open list <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...nel.org>
Subject: Re: [PATCH] drm/msm: Disable frequency clamping on a630

On Mon 09 Aug 14:08 PDT 2021, Rob Clark wrote:

> On Mon, Aug 9, 2021 at 1:35 PM Caleb Connolly <caleb.connolly@...aro.org> wrote:
> >
> >
> >
> > On 09/08/2021 18:58, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 10:28 AM Akhil P Oommen <akhilpo@...eaurora.org> wrote:
> > >>
> > >> On 8/9/2021 9:48 PM, Caleb Connolly wrote:
> > >>>
> > >>>
> > >>> On 09/08/2021 17:12, Rob Clark wrote:
> > >>>> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen <akhilpo@...eaurora.org>
> > >>>> wrote:
> > >>>>>
> > >>>>> On 8/8/2021 10:22 PM, Rob Clark wrote:
> > >>>>>> On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly
> > >>>>>> <caleb.connolly@...aro.org> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 07/08/2021 21:04, Rob Clark wrote:
> > >>>>>>>> On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
> > >>>>>>>> <caleb.connolly@...aro.org> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi Rob, Akhil,
> > >>>>>>>>>
> > >>>>>>>>> On 29/07/2021 21:53, Rob Clark wrote:
> > >>>>>>>>>> On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> > >>>>>>>>>> <caleb.connolly@...aro.org> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 29/07/2021 21:24, Rob Clark wrote:
> > >>>>>>>>>>>> On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> > >>>>>>>>>>>> <caleb.connolly@...aro.org> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Rob,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I've done some more testing! It looks like before that patch
> > >>>>>>>>>>>>> ("drm/msm: Devfreq tuning") the GPU would never get above
> > >>>>>>>>>>>>> the second frequency in the OPP table (342MHz) (at least, not
> > >>>>>>>>>>>>> in glxgears). With the patch applied it would more
> > >>>>>>>>>>>>> aggressively jump up to the max frequency which seems to be
> > >>>>>>>>>>>>> unstable at the default regulator voltages.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *ohh*, yeah, ok, that would explain it
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up
> > >>>>>>>>>>>>> to 0.988v (instead of the stock 0.516v) makes the GPU stable
> > >>>>>>>>>>>>> at the higher frequencies.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Applying this patch reverts the behaviour, and the GPU never
> > >>>>>>>>>>>>> goes above 342MHz in glxgears, losing ~30% performance in
> > >>>>>>>>>>>>> glxgear.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I think (?) that enabling CPR support would be the proper
> > >>>>>>>>>>>>> solution to this - that would ensure that the regulators run
> > >>>>>>>>>>>>> at the voltage the hardware needs to be stable.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Is hacking the voltage higher (although ideally not quite
> > >>>>>>>>>>>>> that high) an acceptable short term solution until we have
> > >>>>>>>>>>>>> CPR? Or would it be safer to just not make use of the higher
> > >>>>>>>>>>>>> frequencies on a630 for now?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is
> > >>>>>>>>>>>> already
> > >>>>>>>>>>>> on CC and I added sboyd, maybe one of them knows better.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> In the short term, removing the higher problematic OPPs from
> > >>>>>>>>>>>> dts might
> > >>>>>>>>>>>> be a better option than this patch (which I'm dropping), since
> > >>>>>>>>>>>> there
> > >>>>>>>>>>>> is nothing stopping other workloads from hitting higher OPPs.
> > >>>>>>>>>>> Oh yeah that sounds like a more sensible workaround than mine .
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'm slightly curious why I didn't have problems at higher OPPs
> > >>>>>>>>>>>> on my
> > >>>>>>>>>>>> c630 laptop (sdm850)
> > >>>>>>>>>>> Perhaps you won the sillicon lottery - iirc sdm850 is binned
> > >>>>>>>>>>> for higher clocks as is out of the factory.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Would it be best to drop the OPPs for all devices? Or just
> > >>>>>>>>>>> those affected? I guess it's possible another c630 might
> > >>>>>>>>>>> crash where yours doesn't?
> > >>>>>>>>>>
> > >>>>>>>>>> I've not heard any reports of similar issues from the handful of
> > >>>>>>>>>> other
> > >>>>>>>>>> folks with c630's on #aarch64-laptops.. but I can't really say
> > >>>>>>>>>> if that
> > >>>>>>>>>> is luck or not.
> > >>>>>>>>> It looks like this affects at least the OnePlus 6 and PocoPhone
> > >>>>>>>>> F1, I've done some more poking and the following diff
> > >>>>>>>>> seems to fix the stability issues completely, it seems the delay
> > >>>>>>>>> is required to let the update propagate.
> > >>>>>>>>>
> > >>>>>>>>> This doesn't feel like the right fix, but hopefully it's enough
> > >>>>>>>>> to come up with a better solution than disabling the new
> > >>>>>>>>> devfreq behaviour on a630.
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>>>>> index d7cec7f0dde0..69e2a5e84dae 100644
> > >>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>>>>> @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu,
> > >>>>>>>>> struct dev_pm_opp *opp)
> > >>>>>>>>>                      return;
> > >>>>>>>>>              }
> > >>>>>>>>>
> > >>>>>>>>> +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > >>>>>>>>> +
> > >>>>>>>>> +       usleep_range(300, 500);
> > >>>>>>>>> +
> > >>>>>>>>
> > >>>>>
> > >>>>> I am a bit confused. We don't define a power domain for gpu in dt,
> > >>>>> correct? Then what exactly set_opp do here? Do you think this usleep is
> > >>>>> what is helping here somehow to mask the issue?
> > >>> The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
> > >>> the GPU DT. For the sake of simplicity I'll refer to the lowest
> > >>> frequency (257000000) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
> > >>> the "min" state, and the highest frequency (710000000) and OPP level
> > >>> (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in
> > >>> sdm845.dtsi under the gpu node.
> > >>>
> > >>> The new devfreq behaviour unmasks what I think is a driver bug, it
> > >>> inadvertently puts much more strain on the GPU regulators than they
> > >>> usually get. With the new behaviour the GPU jumps from it's min state to
> > >>> the max state and back again extremely rapidly under workloads as small
> > >>> as refreshing UI. Where previously the GPU would rarely if ever go above
> > >>> 342MHz when interacting with the device, it now jumps between min and
> > >>> max many times per second.
> > >>>
> > >>> If my understanding is correct, the current implementation of the GMU
> > >>> set freq is the following:
> > >>>    - Get OPP for frequency to set
> > >>>    - Push the frequency to the GMU - immediately updating the core clock
> > >>>    - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds
> > >>> up somewhere in power management code and causes the gx regulator level
> > >>> to be updated
> > >>
> > >> Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else.
> > >> We were using a different api earlier which got deprecated -
> > >> dev_pm_opp_set_bw().
> > Huh ok, thanks for the correction. So it's the GMU writes in this function which cause the regulator to be adjusted?
> > >
> > > Hmm, ok, if this is just setting icc vote, the order shouldn't be too important.
> > >
> > > I guess GMU then is the one that is controlling the regulator(s) to
> > > ensure adequate voltage for the requested freq?
> > >
> > > But the GMU fw should be the same for a618 and a630, md5sum of what
> > > I'm using (from linux-firmware):
> > >
> > >    ab20135f7adf48e0f344282a37da80e4  a630_gmu.bin
> > Same here.
> > >
> > >>>
> > >>> The regulator will then take some time to reach it's new voltage level
> > >>> and stabilise. I believe that rapid transitions between min and max
> > >>> state - in combination with the increased current load from the GPU core
> > >>> - lead to the regulator becoming unstable (e.g. when it's requested to
> > >>> transition from it's lowest to highest levels immediately after
> > >>> transitioning down), the unstable voltage causes the GPU to crash.
> > >>>
> > >>> Sillicon lottery will of course play a role here - this is very much an
> > >>> edge case and would definitely be different on a per-device and even
> > >>> per-unit basis.
> > >>>>
> > >>>> Hmm, I thought "opp-level = RPMH_REGULATOR_LEVEL_*" did *something*,
> > >>>> but tbh I'm not sure exactly what..
> > >>>>
> > >>>>> I feel we should just leave the new dcvs feature (shall we call it NAP?)
> > >>>>> disabled for a630 (and 10ms devfreq interval), until this is root
> > >>>>> caused.
> > >>> I believe this hacky workaround expresses the root cause of the issue
> > >>> quite clearly, by setting the OPP first and allowing the gx regulator to
> > >>> become stable before telling the GPU to change clock speeds, we avoid
> > >>> the edge case and prevent the crashes.
> > >>>
> > >>> I took some rough measurements by adding logging to msm_devfreq_idle and
> > >>> causing UI updates for ~20 seconds and that function is being called
> > >>> about 30 times per second, this means the GPU is transitioning between
> > >>> min (idle) state and max (active / boost) state at that frequency and
> > >>> causing the issue I described above. It's likely that the usleep is
> > >>> helping to mask this behaviour.
> > >>>
> > >>> I hope this serves as a slightly better explanation of what I perceive
> > >>> to be the issue, I realise my previous explanations were not very
> > >>> adequate, I apologise for all the noise.
> > >>>>
> > >>>> I suppose "NAP" is a reasonable name.
> > >>>>
> > >>>> But I think that reverting to previous behavior would not be enough,
> > >>>> there is nothing stopping devfreq from jumping from min to max freq,
> > >>>> which AFAIU should be enough to trigger this.  I guess that there just
> > >>>> hasn't been enough testing with different game workloads on those
> > >>>> phones to trigger this.
> > >>> Ack
> > >>>>
> > >>>> That said, I haven't seen similar issues on my sdm850 laptop, where I
> > >>>> defn have triggered mix->max freq transitions.. I guess it would be
> > >>>> interesting to know if this issue could be reproduced on db845c, or if
> > >>>> it really is board specific?
> > >>> My db845c arrives this week, I'll definitely try and reproduce this.
> > >>>>
> > >>>> To workaround, I think we'd need to implement some way to limit that
> > >>>> maximum frequency jump (and then use delayed work to continue ramping
> > >>>> up the freq over time until we hit the target).. which seems like a
> > >>>> lot of work if this is just a board(s) specific workaround and isn't
> > >>>> needed once CPR is supported
> > >>> Based on my reasoning above, I came up with the following: reducing
> > >>> thrashing by preventing rapid idle/active transitions. The minimum
> > >>> active time of 30ms was just used for testing, I think some number
> > >>> between 2 and 4 frames would be a sensible choice - the higher the safer.
> > >>>
> > >>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>> index d7cec7f0dde0..87f2d1085c3e 100644
> > >>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>> @@ -139,6 +139,8 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct
> > >>> dev_pm_opp *opp)
> > >>>                   return;
> > >>>           }
> > >>>
> > >>> +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > >>> +
> > >>>           gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0);
> > >>>
> > >>>           gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING,
> > >>> @@ -158,7 +160,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct
> > >>> dev_pm_opp *opp)
> > >>>           if (ret)
> > >>>                   dev_err(gmu->dev, "GMU set GPU frequency error: %d\n",
> > >>> ret);
> > >>>
> > >>> -       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > >>>           pm_runtime_put(gmu->dev);
> > >>>    }
> > >>>
> > >>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> > >>> index 0e4b45bff2e6..0e2293bcb46d 100644
> > >>> --- a/drivers/gpu/drm/msm/msm_gpu.h
> > >>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> > >>> @@ -99,8 +99,8 @@ struct msm_gpu_devfreq {
> > >>>           /** time: Time of last sampling period. */
> > >>>           ktime_t time;
> > >>>
> > >>> -       /** idle_time: Time of last transition to idle: */
> > >>> -       ktime_t idle_time;
> > >>> +       /** transition_time: Time of last transition between
> > >>> idle/active: */
> > >>> +       ktime_t transition_time;
> > >>>
> > >>>           /**
> > >>>            * idle_freq:
> > >>> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > >>> b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > >>> index 0a1ee20296a2..774a7be33e7a 100644
> > >>> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > >>> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > >>> @@ -157,7 +157,7 @@ void msm_devfreq_active(struct msm_gpu *gpu)
> > >>>            */
> > >>>           mutex_lock(&df->devfreq->lock);
> > >>>
> > >>> -       idle_time = ktime_to_ms(ktime_sub(ktime_get(), df->idle_time));
> > >>> +       idle_time = ktime_to_ms(ktime_sub(ktime_get(),
> > >>> df->transition_time));
> > >>>
> > >>>           /*
> > >>>            * If we've been idle for a significant fraction of a polling
> > >>> @@ -168,7 +168,7 @@ void msm_devfreq_active(struct msm_gpu *gpu)
> > >>>                   target_freq *= 2;
> > >>>           }
> > >>>
> > >>> -       df->idle_freq = 0;
> > >>> +       df->transition_time = ktime_get();;
> > >>>
> > >>>           msm_devfreq_target(&gpu->pdev->dev, &target_freq, 0);
> > >>>
> > >>> @@ -185,6 +185,16 @@ void msm_devfreq_idle(struct msm_gpu *gpu)
> > >>>    {
> > >>>           struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >>>           unsigned long idle_freq, target_freq = 0;
> > >>> +       unsigned int active_time;
> > >>> +
> > >>> +       active_time = ktime_to_ms(ktime_sub(ktime_get(),
> > >>> df->transition_time));
> > >>> +       /*
> > >>> +        * Don't go back to idle unless we've been active for at least 30ms
> > >>> +        * to avoid thrashing.
> > >>
> > >> This basically defeats the purpose of this feature! At least, we should
> > >> keep this '30' gpu specific. Does a Kconfig makes sense here?? BTW, if
> > >> 300us was helping you earlier why do you want it to be 30ms now?
> > Previously I thought that the issue was related to specifically the transition from idle/active, hence sleeping to let
> > the regulator catch up, whilst that masked the issue it didn't *fix* it, I now think it's actually due to the repeated
> > transition between idle and active states.
> >
> > Enforcing that the GPU stay active for at least two frames should still give the intended goal of reducing latency and
> > more reliably fixes the issue.
> >
> > AFAIU from reading the commit description, the goal of the devfreq tuning is to reduce latency by quickly bursting up
> > when there's user activity, by telling the GPU to stay active for longer we shouldn't impede this behaviour at all.
> 
> Well, there are a couple parts to it.. one thing it was intended to
> fix was a bad devfreq behavior I was seeing with, for example, games
> that throttle themselves to 30fps, so rendering one 16ms frame every
> other vblank cycle.. previously devfreq would ramp up to max just as
> it was at the end of rendering a frame, and then sit there at fmax
> while GPU was doing nothing for the next 16ms, and then ramp back down
> to fmin just as the GPU got some more work to do.  So it was nearly
> 180deg out of phase with where you'd want it to be
> increasing/decreasing GPU freq.
> 

But afaict you only change the selection of frequency, not the actual
change. As such this issue isn't related to your change.

> The longer polling interval is meant to smooth that out, with clamping
> to fmin while GPU is idle to offset the fact that it would take the
> GPU longer to ramp down (and it otherwise being pointless to keep the
> GPU at a high freq when it isn't doing anything), and boosting above
> what freq devfreq would have picked if the gpu had been idle for a
> while (to offset the longer ramp up on user input).
> 
> So the 30ms delay for clamping to fmin would defeat one part of that.
> 
> We could perhaps somehow disable the clamping to fmin for certain
> boards and/or gpus, which would possibly lose a bit of power savings
> but otherwise be ok.  But I'm not clear whether this is a board
> specific issue (ie. are these phones using different PMICs compared to
> sdm850 laptops and db845c?  Or is there some difference in what power
> rail is powering the GPU?)
> 
> I think it was mentioned earlier that CPR should help (AFAIU that is
> some sort of hw closed loop voltage regulation?) so maybe this is just
> a short term workaround?
> 

On 845 and onwards, we pick a corner which will be translated to an
actual voltage by someone else and if CPR is involved is hidden in that
other entity.

Regards,
Bjorn