lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y+JfwM0O/0ZjMHua@orome>
Date:   Tue, 7 Feb 2023 15:27:12 +0100
From:   Thierry Reding <thierry.reding@...il.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     Wei Ni <wni@...dia.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
        Jon Hunter <jonathanh@...dia.com>,
        Johan Hovold <johan@...nel.org>,
        Philipp Zabel <p.zabel@...gutronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
        Linux PM mailing list <linux-pm@...r.kernel.org>
Subject: Re: thermal/drivers/tegra: Getting rid of the get_thermal_instance()
 usage

On Tue, Feb 07, 2023 at 01:38:08PM +0100, Daniel Lezcano wrote:
> On 07/02/2023 13:18, Thierry Reding wrote:
> > On Mon, Feb 06, 2023 at 03:50:22PM +0100, Daniel Lezcano wrote:
> > > 
> > > Hi Thierry,
> > > 
> > > did you have the time to look at the get_thermal_instance() removal ?
> > > 
> > > 
> > > On 26/01/2023 13:55, Thierry Reding wrote:
> > > 
> > > > 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > > > 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > > > 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > > > 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > > > 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > > > 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > > > 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > > > 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > > > 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > > > 
> > > > and after these changes, it turns into:
> > > > 
> > > > 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > > > 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > > > 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > > > 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > > > 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > > > 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > > > 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > > > 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > > > 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> > > > 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> > > > 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > > > 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> > > > 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> > > > 
> > > > The "programming throttle ..." messages are something I've added locally
> > > > to trace what gets called. So it looks like for "pll" and "mem" thermal
> > > > zones, we now program trip points whereas we previously didn't.
> > 
> > My diagnosis above wasn't entirely correct. We're not actually skipping
> > trip point programming for PLL and MEM thermal zones in the current
> > code. Instead, we skip throttle programming. As far as I can tell this
> > is a mechanism built into ACTMON to allow it to automatically throttle
> > when a zone reaches a certain temperature.
> > 
> > This is modelled as a cooling device, but internally it's actually done
> > automatically, which is why we have this code that programs the throttle
> > at driver probe time, rather than the on-demand programming that typical
> > cooling device would do (such as a fan).
> > 
> > The reason why we have get_thermal_instance() here is to check if this
> > built-in cooling device has been configured for the "hot" trip point. If
> > not, we don't want the throttle programming to happen. This adds the
> > added flexibility of explicitly disabling the automatic throttling by
> > ACTMON and using another cooling device (or none at all) if that's what
> > is needed.
> > 
> > Dropping just the call to get_thermal_instance() and relying on the
> > find_throttle_cfg_by_name() function will always return a valid throttle
> > configuration. This is slightly obfuscated because of this:
> > 
> > 	cdev = ts->throt_cfgs[i].cdev;
> > 	if (get_thermal_instance(tz, cdev, trip_id))
> > 		stc = find_throttle_cfg_by_name(ts, cdev->type);
> > 
> > As far as I can tell this will always return &ts->throt_cfgs[i], so the
> > find_throttle_cfg_by_name() call is a bit redundant here. I'll look into
> > fixing that.
> > 
> > In any case, the important thing is that it would always find a valid
> > throttle configuration and therefore program the throttle, even if we
> > may not want to.
> 
> Why not rely on the thermal framework mechanism to set the hwtrpis ?
> 
> thermal_zone_device_register() calls thermal_zone_device_update(). This one
> calls thermal_zone_set_trips() which programs the hardware trip point.
> 
> When we suspend/resume, the PM notifiers are calling
> thermal_zone_device_update() which in turn sets the hw trip points.
> 
> May be I'm missing something but isn't enough for the sensor ?

These aren't actually trip points getting programmed, but rather the
built-in throttling mechanism. That said, it might be possible to append
that programming to the driver's ->set_trips() implementation. I'll look
into that.

Thanks for the suggestion,
Thierry

> 
> 
> > Possibly we could work around that by removing this fiddly special case
> > and instead add a new callback for the cooling devices that can be run
> > when they are bound to a thermal zone. This would allow the throttle
> > programming to be initiated from within the thermal core rather than
> > "bolted on" like it is now and should allow us to achieve the same
> > effect but without calling into get_thermal_instance().
> > 
> > I'll try and prototype this, but feel free to suggest anything better if
> > you can think of something.
> > 
> > Thierry
> 
> -- 
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
> 

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ