[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: 
 <CAGwozwHa3GSNGyRRp4=bR+Wsy2VLgwAbSmcdWb2=5rEyi7jdQw@mail.gmail.com>
Date: Wed, 29 Oct 2025 09:48:08 +0100
From: Antheas Kapenekakis <lkml@...heas.dev>
To: "Mario Limonciello (AMD) (kernel.org)" <superm1@...nel.org>
Cc: platform-driver-x86@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-hwmon@...r.kernel.org, Hans de Goede <hansg@...nel.org>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	Derek John Clark <derekjohn.clark@...il.com>,
	Joaquín Ignacio Aramendía <samsagax@...il.com>,
	Jean Delvare <jdelvare@...e.com>, Guenter Roeck <linux@...ck-us.net>
Subject: Re: [PATCH v2 6/6] platform/x86: ayaneo-ec: Add suspend hook
On Wed, 29 Oct 2025 at 04:36, Mario Limonciello (AMD) (kernel.org)
<superm1@...nel.org> wrote:
>
>
>
> On 10/28/2025 4:39 PM, Antheas Kapenekakis wrote:
> > On Tue, 28 Oct 2025 at 22:21, Mario Limonciello <superm1@...nel.org> wrote:
> >>
> >> On 10/28/25 3:34 PM, Antheas Kapenekakis wrote:
> >>>>> The fan speed is also lost during hibernation, but since hibernation
> >>>>> failures are common with this class of devices
> >> Why are hibernation failures more common in this class of device than
> >> anything else?  The hibernation flow is nearly all done in Linux driver
> >> code (with the exception of ACPI calls that move devices into D3 and out
> >> of D0).
> >
> > I should correct myself here and say hibernation in general in Linux
> > leaves something to be desired.
> >
> > Until secure boot supports hibernation, that will be the case because
> > not enough people use it.
>
> The upstream kernel has no tie between UEFI secure boot and hibernation.
>   I think you're talking about some distro kernels that tie UEFI secure
> boot to lockdown.  Lockdown does currently prohibit hibernation.
>
> >
> > I have had it break for multiple reasons, not incl. the ones below and
> > the ones we discussed last year where games are loaded.
> >
> > For a few months I fixed some of the bugs but it is not sustainable.
> >
> >> Perhaps you're seeing a manifestation of a general issue that we're
> >> working on a solution for here:
> >>
> >> https://lore.kernel.org/linux-pm/20251025050812.421905-1-safinaskar@gmail.com/
> >>
> >> https://lore.kernel.org/linux-pm/20251026033115.436448-1-superm1@kernel.org/
> >>
> >> https://lore.kernel.org/linux-pm/5935682.DvuYhMxLoT@rafael.j.wysocki/T/#u
> >>
> >> Or if you're on an older kernel and using hybrid sleep we had a generic
> >> issue there as well which was fixed in 6.18-rc1.
> >>
> >> Nonetheless; don't make policy decisions based upon kernel bugs.  Fix
> >> the kernel bugs.
> >
> > My problem is I cannot in good conscience restore a fan speed before
> > the program responsible for it is guaranteed to thaw.
> >
> > The best solution I can come up with would be in freeze save if manual
> > control is enabled, disable it, and then on resume set a flag that
> > makes the first write to fan speed also set pwm to manual.
> >
> > This way suspend->hibernate flows, even if hibernation hangs when
> > creating the image, at least have proper fan control because they are
> > unattended, and resume hangs work similarly.
> >
> > Antheas
> >
>
> This sounds like a workable approach for what I understand to be your
> current design; but let me suggest some other ideas.
>
> What happens if you're running something big and the OOM comes and
> whacks the process?  Now you don't have fan control running anymore.
>
> So I see two options to improve things.
>
> 1) You can have userspace send a "heartbeat" to kernel space.  This can
> be as simple as a timestamp of reading a sysfs file.  If userspace
> doesn't read the file in X ms then you turn off manual control.
The OOT scenario is something I have not handled yet specifically, or
have had happen.
Systemd will restart the service in the case of OOT after 5 seconds
and in the case of a crash there are multiple fallbacks to ensure the
custom curve turns off.
Most of the hibernation hangs that I have experienced happen before
journalctl turns on, so I assumed that it's before userspace
unfreezes. I am also not sure if restore() gets to run in those cases
or not.
Re: heart beat, read below.
> 2) You move everything to a kthread.  Userspace can read some input
> options or maybe pick a few curve settings, but leave all the important
> logic in that kthread.
I think this is what Luke tried to do with the Zotac Zone. But in the
end, the kernel is limited to what calculations it can do, esp.
floating point and what it can access, so you end up with a worse
curve with limited extendability, and a driver specific ABI. And we
also risk duplicating all of this code on hwmon drivers and making it
harder to access.
I think part of this reason is why the platform side of the Zotac
stuff has not been upstreamed, even though the driver itself other
than that is pretty straightforward with an established ABI by now.
And it is also the reason we have not been able to add the module to
Bazzite, because 1) we cannot validate the new fan curve calculations
without a device and 2) they are worse that what we provide through
userspace (a polynomial ramp-up which embeds hysteresis to avoid
jittering, plus choice for both Edge and Tctl sensors).
In summary, I think there would great potential for a common set of
"hwmon" helpers that can use a temperature function and a speed set
function to handle a basic multi-point curve for basic, e.g., udev
use-cases. To that end, there could be a helper with a 5 second
timeout that turns off the custom speed. But it would be good for that
to be implemented globally, so it does not block device hw enablement.
As far as this driver is concerned, I will handle the hibernation case
with a lazy resume, per what I said in the previous email.
Antheas
>
>
Powered by blists - more mailing lists
 
