lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2348857.ElGaqSPkdT@rjwysocki.net>
Date: Thu, 18 Jul 2024 20:57:28 +0200
From: "Rafael J. Wysocki" <rjw@...ysocki.net>
To: Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Lukasz Luba <lukasz.luba@....com>,
 Daniel Lezcano <daniel.lezcano@...aro.org>,
 Neil Armstrong <neil.armstrong@...aro.org>
Subject:
 [PATCH v1 0/2] thermal: core: Handle failed temperature checks more carefully

Hi Everyone,

This series kind of augments

https://lore.kernel.org/linux-pm/4950004.31r3eYUQgx@rjwysocki.net/

so I'm considering adding it to 6.11.

The problem with handing temperature check errors in __thermal_zone_device_update()
after the above is that if someone has a dead thermal zone returning such errors
continuously lurking somewhere in their system, they will get a flood of
"temperature check failed" messages in the log which will be reported as a
regression.  Rightfully, because these messages render the kernel log
practically unusable and the continuous and useless polling of such a thermal
zone may even prevent the system from entering deep idle states.  Clearly,
something needs to be done about this.

One possible approach might be to simply disable the thermal zone in question
after the first error (that is not -EAGAIN) returned by its .get_temp()
callback, but that cannot be done because there are thermal zones in which
.get_temp() returns errors to start with, but they recover later, and they
need to be taken into account.

So the only other alternative that is not overly complicated is to add a
back-off mechanism to the polling, so the thermal zone has a chance to recover,
but the core will not wait for that forever.  At one point it will just disable
the thermal zone and let user space re-enable it if that's regarded as a good
idea.  This is done in the second patch and the first patch is preparatory.

Thanks!




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ