linux-kernel - Re: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAJZ5v0hKH7tYDur4mG8QF0GfgL+jFfW_bBc3QesUS68OQb_PMA@mail.gmail.com>
Date: Wed, 27 Dec 2023 20:53:25 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, rjw@...ysocki.net, lukasz.luba@....com, 
	rui.zhang@...el.com, linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information
 for mitigation episodes

Hi Daniel,

On Sat, Dec 23, 2023 at 12:41 PM Daniel Lezcano
<daniel.lezcano@...aro.org> wrote:
>
>
> Hi Rafael,
>
> On 21/12/2023 20:26, Rafael J. Wysocki wrote:
>
> [ ... ]
>
>
> >> +/**
> >> + * struct tz_events - Store all events related to a mitigation episode
> >> + *
> >> + * The tz_events structure describes a mitigation episode.
> >
> > So why not call it tz_mitigation?
>
> A mitigation episode = N x tz_events
>
> eg.
> trip A = passive cooling - cpufreq cluster0
> trip B = passive cooling - cpufreq cluster0 + cluster1
> trip C = active cooling + fan
>
> temperature trip A < trip B < trip C
>
> The mitigation episode, as defined, begins at trip A, and we can have
> multiple events (eg. trip B crossed several times, trip C, then trip B
> again etc ...).

I understand this, but I thought that tz_events represented the entire episode.

> [ ... ]
>
> >> +       if (dfs->tz.trip_index < 0) {
> >> +               tze = thermal_debugfs_tz_event_alloc(tz, now);
> >> +               if (!tze)
> >> +                       return;
> >> +
> >> +               list_add(&tze->node, &dfs->tz.tz_events);
> >> +       }
> >> +
> >> +       dfs->tz.trip_index++;
> >> +       dfs->tz.trips_crossed[dfs->tz.trip_index] = trip_id;
> >
> > So trip_index is an index into trips_crossed[] and the value is the ID
> > of the trip passed by thermal_debug_tz_trip_up() IIUC, so the trip IDs
> > in trips_crossed[] are always sorted by the trip temperature, in the
> > ascending order.
> >
> > It would be good to write this down somewhere in a comment.
> >
> > And what if trip temperatures change during a mitigation episode such
> > that the order by the trip temperature changes?
>
> Changing a trip point temperature during a mitigation is a general
> question about the thermal framework.
>
> How the governors will behave with such a change on the fly while they
> are in action?
>
> IMO, we should prevent to change a trip point temperature when this one
> is crossed and has a cooling device bound to it.

Well, it's not that simple.

There are legitimate cases in which this can happen on purpose.  For
example, the ACPI spec does not provide a way to get a trip hysteresis
from the platform firmware and instead it recommends the firmware to
update the trip temperature every time it is crossed on the way up in
order to implement hysteresis.

> >> +
> >> +       tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
> >> +       tze->trip_stats[trip_id].timestamp = now;
> >> +       tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
> >> +       tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
> >> +       tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> >> +               (temperature - tze->trip_stats[trip_id].avg) /
> >> +               tze->trip_stats[trip_id].count;
> >> +
> >> +       mutex_unlock(&dfs->lock);
> >> +}
>
>
>
> --