lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gytZB=WcEhFkuoOOf_X1=QBYy7+FnVEWLavXzFZc5E=g@mail.gmail.com>
Date: Tue, 25 Jun 2024 14:33:01 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Johan Hovold <johan@...nel.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Lukasz Luba <lukasz.luba@....com>, 
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>, Zhang Rui <rui.zhang@...el.com>, 
	Jens Glathe <jens.glathe@...schoolsolutions.biz>, Steev Klimaszewski <steev@...i.org>, 
	Johan Hovold <johan+linaro@...nel.org>
Subject: Re: [PATCH v1] thermal: gov_step_wise: Go straight to instance->lower
 when mitigation is over

On Tue, Jun 25, 2024 at 9:34 AM Johan Hovold <johan@...nel.org> wrote:
>
> On Sat, Jun 22, 2024 at 02:26:33PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > Commit b6846826982b ("thermal: gov_step_wise: Restore passive polling
> > management") attempted to fix a Step-Wise thermal governor issue
> > introduced by commit 042a3d80f118 ("thermal: core: Move passive polling
> > management to the core"), which caused the governor to leave cooling
> > devices in high states, by partially revering that commit.
>
> typo: reverting

Thanks!

> > However, this turns out to be insufficient on some systems due to
> > interactions between the governor code restored by commit b6846826982b
> > and the passive polling management in the thermal core.
>
> Care to elaborate on what went wrong here? In my test of the previous
> fix I saw the frequency ramping up in steps as expected when the
> temperature dropped. Under what circumstances would that fail to happen?

System suspend-resume would interfere with that as it would call
thermal_zone_device_init().

> > For this reason, revert commit b6846826982b and make the governor set
> > the target cooling device state to the "lower" one as soon as the zone
> > temperature falls below the threshold of the trip point corresponding
> > to the given thermal instance, which means that thermal mitigation is
> > not necessary any more.
> >
> > Before this change the "lower" cooling device state would be reached in
> > steps through the passive polling mechanism which was questionable for
> > three reasons: (1) cooling device were kept in high states when that was
> > not necessary (and it could adversely impact performance), (2) it only
> > worked for thermal zones with nonzero passive_delay_jiffies value, and
> > (3) passive polling belongs to the core and should not be hijacked by
> > governors for their internal purposes.
>
> I've tested this patch on the Lenovo ThinkPad X13s, where I could
> reproduce the rc1 regression, and things works as intended with the
> fix applied to rc5:
>
> Tested-by: Johan Hovold <johan+linaro@...nel.org>
>
> The CPU frequency still oscillates heavily but now with a more
> sawtoothed curve.
>
> Not sure if it helps with performance, though, as running the CPU at
> full speed as soon as we drop below the threshold (with hysteresis)
> also means that we get back to running at the lowest frequency even
> faster.

True, but assuming that the reason for the mitigation is still there.
If it's actually gone, it's better to stop cooling as soon as it can
be done.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ