lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210217055029.a25wjsyoosxageti@vireshk-i7>
Date:   Wed, 17 Feb 2021 11:20:29 +0530
From:   Viresh Kumar <viresh.kumar@...aro.org>
To:     Thara Gopinath <thara.gopinath@...aro.org>
Cc:     rjw@...ysocki.net, bjorn.andersson@...aro.org,
        linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org
Subject: Re: [PATCH] cpufreq: exclude boost frequencies from valid count if
 not enabled

Hi Thara,

On 16-02-21, 19:00, Thara Gopinath wrote:
> This is a fix for a regression observed on db845 platforms with 5.7-rc11
> kernel.  On these platforms running stress tests with 5.11-rc7 kernel
> causes big cpus to overheat and ultimately shutdown the system due to
> hitting critical temperature (thermal throttling does not happen and
> cur_state of cpufreq cooling device for big cpus remain stuck at 0 or max
> frequency).
> 
> This platform has boost opp defined for big cpus but boost mode itself is
> disabled in the cpufreq driver. Hence the initial max frequency request
> from cpufreq cooling device(cur_state) for big cpus is for boost
> frequency(2803200) where as initial max frequency request from cpufreq
> driver itself is for the highest non boost frequency (2649600).

Okay.

> qos
> framework collates these two requests and puts the max frequency of big
> cpus to 2649600 which the thermal framework is unaware of.

It doesn't need to be aware of that. It sets its max frequency and other
frameworks can put their own requests and the lowest one wins. In this case the
other constraint came from cpufreq-core, which is fine.

> Now during an
> over heat event, with step-wise policy governor, thermal framework tries to
> throttle the cpu and places a restriction on max frequency of the cpu to
> cur_state - 1

Actually it is cur_state + 1 as the values are inversed here, cooling state 0
refers to highest frequency :)

> which in this case 2649600. qos framework in turn tells the
> cpufreq cooling device that max frequency of the cpu is already at 2649600
> and the cooling device driver returns doing nothing(cur_state of the
> cooling device remains unchanged).

And that's where the bug lies, I have sent proper fix for that now.

> Thus thermal remains stuck in a loop and
> never manages to actually throttle the cpu frequency. This ultimately leads
> to system shutdown in case of a thermal overheat event on big cpus.
 
> There are multiple possible fixes for this issue. Fundamentally,it is wrong
> for cpufreq driver and cpufreq cooling device driver to show different
> maximum possible state/frequency for a cpu.

Not actually, cpufreq core changes the max supported frequency at runtime based
on the availability of boost frequencies.

cpufreq_table_count_valid_entries() is used at different places and it is
implemented correctly.

-- 
viresh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ