lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Feb 2022 16:50:22 -0800
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        amit.kachhap@...il.com, daniel.lezcano@...aro.org,
        viresh.kumar@...aro.org, rafael@...nel.org, amitk@...nel.org,
        rui.zhang@...el.com, dietmar.eggemann@....com,
        Pierre.Gondois@....com, Douglas Anderson <dianders@...omium.org>,
        Stephen Boyd <swboyd@...omium.org>,
        Rajendra Nayak <rnayak@...eaurora.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>
Subject: Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in
 cpufreq_cooling and devfreq_cooling

On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote:
> The Energy Model supports power values either in Watts or in some abstract
> scale. When the 2nd option is in use, the thermal governor IPA should not
> be allowed to operate, since the relation between cooling devices is not
> properly defined. Thus, it might be possible that big GPU has lower power
> values in abstract scale than a Little CPU. To mitigate a misbehaviour
> of the thermal control algorithm, simply not register a cooling device
> capable of working with IPA.

Ugh, this would break thermal throttling for existing devices that are
currently supported in the upstream kernel.

Wasn't the conclusion that it is the responsability of the device tree
owners to ensure that cooling devices with different scales aren't used
in the same thermal zone?

That's also what's currently specified in the power allocator
documentation:

  Another important thing is the consistent scale of the power values
  provided by the cooling devices. All of the cooling devices in a single
  thermal zone should have power values reported either in milli-Watts
  or scaled to the same 'abstract scale'.

Which was actually added by yourself:

commit 5a64f775691647c242aa40d34f3512e7b179a921
Author: Lukasz Luba <lukasz.luba@....com>
Date:   Tue Nov 3 09:05:58 2020 +0000

    PM: EM: Clarify abstract scale usage for power values in Energy Model

    The Energy Model (EM) can store power values in milli-Watts or in abstract
    scale. This might cause issues in the subsystems which use the EM for
        estimating the device power, such as:

     - mixing of different scales in a subsystem which uses multiple
            (cooling) devices (e.g. thermal Intelligent Power Allocation (IPA))

     - assuming that energy [milli-Joules] can be derived from the EM power
            values which might not be possible since the power scale doesn't have
	           to be in milli-Watts

    To avoid misconfiguration add the requisite documentation to the EM and
        related subsystems: EAS and IPA.

    Signed-off-by: Lukasz Luba <lukasz.luba@....com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>


It's ugly to have the abstract scales in the first place, but that's
unfortunately what we currently have for at least some cooling devices.

IMO it would be preferable to stick to catching incompliant configurations
in reviews, rather than breaking thermal throttling for existing devices
with configurations that comply with the current documentation.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ