lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Jun 2023 21:05:07 +0200
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     Eduardo Valentin <evalenti@...nel.org>, eduval@...zon.com,
        linux-pm@...r.kernel.org
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Amit Kucheria <amitk@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/7] thermal: enhancements on thermal stats


Hi Eduardo,

On 19/05/2023 05:27, Eduardo Valentin wrote:
> Hello Rafael and Daniel
> 
> After a long hiatus, I am returning to more frequent contributions
> to the thermal subsystems, as least until I drain some of the
> commits I have in my trees.
> 
> This is a first series of several that will come as improvements
> on the thermal subsystem that will enable using this subsystem
> in the Baseboard Management Controller (BMC) space, as part
> of the Nitro BMC project. To do so, there were a few improvements
> and new features wrote.
> 
> In this series in particular, I present a set of enhancements
> on how we are handling statistics. The cooling device stats
> are awesome, but I added a few new entries there. I also
> introduce stats per thermal zone here too.

 From my POV, that kind of information belongs to debugfs. sysfs is not 
suitable for that.

The cdev stats are a total mess because of the page size limitation of 
sysfs and the explosion of the combination when there are a large number 
of states (eg. display is 1024 cooling device states resulting in a 
matrix of 1024 x 1024, so more than 4MB of memory).

For the record, I'm working on such of statistics [1][2], and optimized 
this cooling device statistics in order to get ride of the existing 
sysfs cdev stats.

Actually, all the stats rely on the mitigation episodes. However, for 
that we need to correctly identify when they begin and when they end. We 
can have mitigation episode inside mitigation episode (eg. passive 
mitigation@...p0 and active mitigation@...p1).

This is not working today because the trip point detection is incorrect, 
thus the mitigation episodes are also incorrect, consequently the stats 
are de facto incorrect.

There is more details at [3] but the change assumes the trip points are 
ordered in the ascending order which is wrong, that is why it was not 
merged.

The mitigation works but the detection is fuzzy, so the math is 
inaccurate and as we are in the boundaries of a temperature limit, the 
resulting statistics do not show us the interesting information to 
optimize the governors when they are not totally inconsistent.

All the work around the generic trip points is to fix that.

There is a proposal at LPC to add statistic/debug information for 
thermal, may be you can participate so we join our efforts?

   -- Daniel

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/trip-crossed%2bdebugfs

[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/debugfs-v2

[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/commit/?h=thermal/trip-crossed%2bdebugfs&id=7d713a9128ad9a153de9c3f5b854c6f1acfb3064



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ