lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f7c2544-2b9d-4997-942a-5bd3ea72e3a3@roeck-us.net>
Date: Wed, 29 Oct 2025 03:22:27 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Antheas Kapenekakis <lkml@...heas.dev>,
 "Mario Limonciello (AMD) (kernel.org)" <superm1@...nel.org>
Cc: platform-driver-x86@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-hwmon@...r.kernel.org, Hans de Goede <hansg@...nel.org>,
 Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
 Derek John Clark <derekjohn.clark@...il.com>,
 Joaquín Ignacio Aramendía <samsagax@...il.com>,
 Jean Delvare <jdelvare@...e.com>
Subject: Re: [PATCH v2 6/6] platform/x86: ayaneo-ec: Add suspend hook

On 10/29/25 01:48, Antheas Kapenekakis wrote:
> On Wed, 29 Oct 2025 at 04:36, Mario Limonciello (AMD) (kernel.org)
> <superm1@...nel.org> wrote:
>>
>>
>>
>> On 10/28/2025 4:39 PM, Antheas Kapenekakis wrote:
>>> On Tue, 28 Oct 2025 at 22:21, Mario Limonciello <superm1@...nel.org> wrote:
>>>>
>>>> On 10/28/25 3:34 PM, Antheas Kapenekakis wrote:
>>>>>>> The fan speed is also lost during hibernation, but since hibernation
>>>>>>> failures are common with this class of devices
>>>> Why are hibernation failures more common in this class of device than
>>>> anything else?  The hibernation flow is nearly all done in Linux driver
>>>> code (with the exception of ACPI calls that move devices into D3 and out
>>>> of D0).
>>>
>>> I should correct myself here and say hibernation in general in Linux
>>> leaves something to be desired.
>>>
>>> Until secure boot supports hibernation, that will be the case because
>>> not enough people use it.
>>
>> The upstream kernel has no tie between UEFI secure boot and hibernation.
>>    I think you're talking about some distro kernels that tie UEFI secure
>> boot to lockdown.  Lockdown does currently prohibit hibernation.
>>
>>>
>>> I have had it break for multiple reasons, not incl. the ones below and
>>> the ones we discussed last year where games are loaded.
>>>
>>> For a few months I fixed some of the bugs but it is not sustainable.
>>>
>>>> Perhaps you're seeing a manifestation of a general issue that we're
>>>> working on a solution for here:
>>>>
>>>> https://lore.kernel.org/linux-pm/20251025050812.421905-1-safinaskar@gmail.com/
>>>>
>>>> https://lore.kernel.org/linux-pm/20251026033115.436448-1-superm1@kernel.org/
>>>>
>>>> https://lore.kernel.org/linux-pm/5935682.DvuYhMxLoT@rafael.j.wysocki/T/#u
>>>>
>>>> Or if you're on an older kernel and using hybrid sleep we had a generic
>>>> issue there as well which was fixed in 6.18-rc1.
>>>>
>>>> Nonetheless; don't make policy decisions based upon kernel bugs.  Fix
>>>> the kernel bugs.
>>>
>>> My problem is I cannot in good conscience restore a fan speed before
>>> the program responsible for it is guaranteed to thaw.
>>>
>>> The best solution I can come up with would be in freeze save if manual
>>> control is enabled, disable it, and then on resume set a flag that
>>> makes the first write to fan speed also set pwm to manual.
>>>
>>> This way suspend->hibernate flows, even if hibernation hangs when
>>> creating the image, at least have proper fan control because they are
>>> unattended, and resume hangs work similarly.
>>>
>>> Antheas
>>>
>>
>> This sounds like a workable approach for what I understand to be your
>> current design; but let me suggest some other ideas.
>>
>> What happens if you're running something big and the OOM comes and
>> whacks the process?  Now you don't have fan control running anymore.
>>
>> So I see two options to improve things.
>>
>> 1) You can have userspace send a "heartbeat" to kernel space.  This can
>> be as simple as a timestamp of reading a sysfs file.  If userspace
>> doesn't read the file in X ms then you turn off manual control.
> 
> The OOT scenario is something I have not handled yet specifically, or
> have had happen.
> 
> Systemd will restart the service in the case of OOT after 5 seconds
> and in the case of a crash there are multiple fallbacks to ensure the
> custom curve turns off.
> 
> Most of the hibernation hangs that I have experienced happen before
> journalctl turns on, so I assumed that it's before userspace
> unfreezes. I am also not sure if restore() gets to run in those cases
> or not.
> 
> Re: heart beat, read below.
> 
>> 2) You move everything to a kthread.  Userspace can read some input
>> options or maybe pick a few curve settings, but leave all the important
>> logic in that kthread.
> 
> I think this is what Luke tried to do with the Zotac Zone. But in the
> end, the kernel is limited to what calculations it can do, esp.
> floating point and what it can access, so you end up with a worse
> curve with limited extendability, and a driver specific ABI. And we
> also risk duplicating all of this code on hwmon drivers and making it
> harder to access.
> 
> I think part of this reason is why the platform side of the Zotac
> stuff has not been upstreamed, even though the driver itself other
> than that is pretty straightforward with an established ABI by now.
> And it is also the reason we have not been able to add the module to
> Bazzite, because 1) we cannot validate the new fan curve calculations
> without a device and 2) they are worse that what we provide through
> userspace (a polynomial ramp-up which embeds hysteresis to avoid
> jittering, plus choice for both Edge and Tctl sensors).
> 
> In summary, I think there would great potential for a common set of
> "hwmon" helpers that can use a temperature function and a speed set
> function to handle a basic multi-point curve for basic, e.g., udev
> use-cases. To that end, there could be a helper with a 5 second
> timeout that turns off the custom speed. But it would be good for that
> to be implemented globally, so it does not block device hw enablement.
> 

Maybe I misunderstand. If so, apologies.

Thermal _control_ is what the thermal subsystem is for. hwmon is for
hardware monitoring, not control. You may do whatever you like
in platform drivers, including the duplication of termal subsystem
functionality, but please do not get hwmon involved. That includes
any kind of helpers to compute any kind of temperature curves.

Thanks,
Guenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ