lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 9 Feb 2022 14:17:00 -0800
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        amit.kachhap@...il.com, daniel.lezcano@...aro.org,
        viresh.kumar@...aro.org, rafael@...nel.org, amitk@...nel.org,
        rui.zhang@...el.com, dietmar.eggemann@....com,
        Pierre.Gondois@....com, Douglas Anderson <dianders@...omium.org>,
        Stephen Boyd <swboyd@...omium.org>,
        Rajendra Nayak <rnayak@...eaurora.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>
Subject: Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in
 cpufreq_cooling and devfreq_cooling

On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote:
> 
> 
> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote:
> > On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote:
> > > 
> > > 
> > > On 2/8/22 12:50 AM, Matthias Kaehlcke wrote:
> > > > On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote:
> > > > > The Energy Model supports power values either in Watts or in some abstract
> > > > > scale. When the 2nd option is in use, the thermal governor IPA should not
> > > > > be allowed to operate, since the relation between cooling devices is not
> > > > > properly defined. Thus, it might be possible that big GPU has lower power
> > > > > values in abstract scale than a Little CPU. To mitigate a misbehaviour
> > > > > of the thermal control algorithm, simply not register a cooling device
> > > > > capable of working with IPA.
> > > > 
> > > > Ugh, this would break thermal throttling for existing devices that are
> > > > currently supported in the upstream kernel.
> > > 
> > > Could you point me to those devices? I cannot find them in the mainline
> > > DT. There are no GPU devices which register Energy Model (EM) in
> > > upstream, neither using DT (which would be power in mW) nor explicitly
> > > providing EM get_power() callback. The EM is needed to have IPA.
> > > 
> > > Please clarify which existing devices are going to be broken with this
> > > change.
> > 
> > I was thinking about arch/arm64/boot/dts/qcom/sc7180-trogdor-*, and
> > potentially other SC7180 boards that use IPA for the CPU thermal
> > zones.
> > 
> > Initially SC7180 used an abstract scale for the CPU energy model,
> > however I realized your change doesn't actually impact SC7180 CPUs
> > for two reasons:
> > 
> > 1. The energy model of the CPUs is registered through
> > 
> >    cpufreq_register_em_with_opp
> >      dev_pm_opp_of_register_em
> >        em_dev_register_perf_domain
> > 
> > em_dev_register_perf_domain() is called with 'milliwatts = true',
> > regardless of the potentially abstract scale, so IPA would not be
> > disabled with your change.
> 
> That good registration.
> 
> > 
> > 2. There is a patch from Doug that adjusted the dynamic power
> > coefficients of the CPUs to be closer to reality:
> > 
> > commit 82ea7d411d43f60dce878252558e926f957109f0
> > Author: Douglas Anderson <dianders@...omium.org>
> > Date:   Thu Sep 2 14:51:37 2021 -0700
> > 
> >      arm64: dts: qcom: sc7180: Base dynamic CPU power coefficients in reality
> 
> Thanks for the link to the commit. I'll have a look. It's good that it
> doesn't break this board.
> 
> > 
> > > > Wasn't the conclusion that it is the responsability of the device tree
> > > > owners to ensure that cooling devices with different scales aren't used
> > > > in the same thermal zone?
> > > 
> > > It's based on assumption that someone has DT and control. There was also
> > > implicit assumption that IPA would work properly on such platform - but
> > > it won't.
> > > 
> > > 1. You cannot have 2 thermal zones: one with CPUs and other with GPU
> > > only and both working with two instances of IPA.
> > 
> > It's not clear to me why such a configuration wouldn't work. Is it also a
> > problem to have multiple CPU thermal zones (one for each core) that use
> > multiple instances of IPA? SC7180 has separate thermal zones for each core
> > (or thermal sensor), Chrome OS uses IPA for CPU thermal throttling.
> 
> Unfortunately, the control mechanism is not working best in such setup.
> Main idea of IPA is to find 'best' OPP for a cooling device, e.g.
> one in the middle of a freq table. Worst scenario is when we change
> our decision between lowest and highest OPP, in short periods.
> It's burning too much power at highest freq, then we throttle too much,
> then we unthrottle because temp is dropping by some 'good enough' value.
> This situation can happen due to a few reasons:
> 1. Power values from EM are not reflecting reality, e.g. scaled too much
> 2. We don't have proper information about CPU load and frequencies used
> 3. PID coefficients are not properly set
> 4. board has small physical thermal sink potential but silicon if 'hot'
> 5. the workload is too dynamic
> 6. there is another power hungry device (GPU, DSP, ISP) which is outside
>    of our control loop, e.g. is controlled in other thermal zone and has
>    impact on our temp sensor value

Thanks for the details!

> Proper setup and tuning of IPA could bring quite good benefit, because
> it could pick the 'best at that moment' OPPs for the devices, instead
> of throttling too hard and then unthrottling. Unfortunately, it's
> tricky and needs time (experiments, understanding the system).

On some sc7180 ChromeOS boards we observed a behavior as you describe,
we mitigated it by tuning the device 'sustainable-power' device tree
property and the trip point temperatures.

> We have been trying to easy this entry barrier for developers. Very good
> results brings the optimization of the PID coefficients. The automated
> mechanism would figure out best values for the given board based on the
> tests run. It's under review now, there are also shared results [1][2].
> It would solve a some of the issues with complex tuning.

Good to know, thanks for the pointer!

> I was going to give it a try for my old experiments with the bogoWatts
> devices, but I don't expect that this is a silver bullet. Regarding
> the 'two thermal zones with IPA issues' I'm prity sure it won't help.
> 
> > 
> > > 2. The abstract power scale doesn't guaranty anything about power values
> > > and IPA was simply designed with milli-Watts in mind. So even working
> > > on CPUs only using bogoWatts, is not what we could guaranty in IPA.
> > 
> > That's bad news for SoCs that are configured with bogoWatt values, from
> > past discussions I had the impression that this is unfortunately not
> > uncommon.
> > 
> > > It's ugly to have the abstract scales in the first place, but that's
> > > unfortunately what we currently have for at least some cooling devices.
> > 
> > > A few questions:
> > > 
> > > Do you use 'we' as Chrome engineers?
> > 
> > I was referring to the kernel, in particular QCOM SC7180.
> 
> Thanks for sharing the name.
> 
> > 
> > > Could you point me to those devices please?
> > 
> > arch/arm64/boot/dts/qcom/sc7180-trogdor-*
> > 
> > Though as per above they shouldn't be impacted by your change, since the
> > CPUs always pretend to use milli-Watts.
> > 
> > [skipped some questions/answers since sc7180 isn't actually impacted by
> >   the change]
> 
> Thank you Matthias. I will investigate your setup to get better
> understanding.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ