[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de>
Date: Sat, 22 Nov 2025 15:18:11 +0100
From: Armin Wolf <W_Armin@....de>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>,
Zhang Rui <rui.zhang@...el.com>, Lukasz Luba <lukasz.luba@....com>,
Len Brown <lenb@...nel.org>, Jonathan Corbet <corbet@....net>,
Ido Schimmel <idosch@...dia.com>, Petr Machata <petrm@...dia.com>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
etnaviv@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-tegra@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-doc@...r.kernel.org, netdev@...r.kernel.org,
linux-wireless@...r.kernel.org, ath10k@...ts.infradead.org,
ath11k@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org, platform-driver-x86@...r.kernel.org,
linux-pci@...r.kernel.org, imx@...ts.linux.dev,
linux-renesas-soc@...r.kernel.org
Subject: Re: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent
device of thermal zone/cooling devices
Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
> On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <W_Armin@....de> wrote:
>> Drivers registering thermal zone/cooling devices are currently unable
>> to tell the thermal core what parent device the new thermal zone/
>> cooling device should have, potentially causing issues with suspend
>> ordering
> This is one potential class of problems that may arise, but I would
> like to see a real example of this.
>
> As it stands today, thermal_class has no PM callbacks, so there are no
> callback execution ordering issues with devices in that class and what
> other suspend/resume ordering issues are there?
Correct, that is why i said "potentially".
>
> Also, the suspend and resume of thermal zones is handled via PM
> notifiers. Is there a problem with this?
The problem with PM notifiers is that thermal zones stop working even before
user space is frozen. Freezing user space might take a lot of time, so having
no thermal management during this period is less than ideal.
This problem would not occur when using dev_pm_ops, as thermal zones would be
suspended after user space has been frozen successfully. Additionally, when using
dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates
that no new devices (including thermal zones and cooling devices) be registered during
a suspend/resume cycle.
Replacing the PM notifiers with dev_pm_ops would of course be a optimization with
its own patch series.
>> and making it impossible for user space applications to
>> associate a given thermal zone device with its parent device.
> Why does user space need to know the parent of a given cooling device
> or thermal zone?
Lets say that we have two thermal zones registered by two instances of the
Intel Wifi driver. User space is currently unable to find out which thermal zone
belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]").
This problem would be solved once we populate the parent device pointer inside the thermal zone
device, as user space can simply look at the "device" symlink to determine the parent device behind
a given thermal zone device.
Additionally, being able to access the acpi_handle of the parent device will be necessary for the
ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors.
>> This patch series aims to fix this issue by extending the functions
>> used to register thermal zone/cooling devices to also accept a parent
>> device pointer. The first six patches convert all functions used for
>> registering cooling devices, while the functions used for registering
>> thermal zone devices are converted by the remaining two patches.
>>
>> I tested this series on various devices containing (among others):
>> - ACPI thermal zones
>> - ACPI processor devices
>> - PCIe cooling devices
>> - Intel Wifi card
>> - Intel powerclamp
>> - Intel TCC cooling
> What exactly did you do to test it?
I tested:
- the thermal zone temperature readout
- correctness of the new sysfs links
- suspend/resume
I also verified that ACPI thermal zones still bind with the ACPI fans.
>> I also compile-tested the remaining affected drivers, however i would
>> still be happy if the relevant maintainers (especially those of the
>> mellanox ethernet switch driver) could take a quick glance at the
>> code and verify that i am using the correct device as the parent
>> device.
> I think that the above paragraph is not relevant any more?
You are right, however i originally meant to CC the mellanox maintainers as
i was a bit unsure about the changes i made to their driver. I will rework
this section in the next revision and CC the mellanox maintainers.
>
>> This work is also necessary for extending the ACPI thermal zone driver
>> to support the _TZD ACPI object in the future.
> I'm still unsure why _TZD support requires the ability to set a
> thermal zone parent device.
_TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans
and ACPI processors, like ACPI batteries. This however will currently not work as
the ACPI thermal zone driver uses the private drvdata of the cooling device to
determine if said cooling device should bind. This only works for ACPI fans and
processors due to the fact that those drivers store a ACPI device pointer inside
drvdata, something the ACPI thermal zone expects.
As we cannot require all cooling devices to store an ACPI device pointer inside
their drvdata field in order to support ACPI, we must use a more generic approach.
I was thinking about using the acpi_handle of the parent device instead of messing
with the drvdata field, but this only works if the parent device pointer of the
cooling device is populated.
(Cooling devices without a parent device would then be ignored by the ACPI thermal
zone driver, as such cooling devices cannot be linked to ACPI).
>
>> Signed-off-by: Armin Wolf <W_Armin@....de>
>> ---
>> Armin Wolf (8):
>> thermal: core: Allow setting the parent device of cooling devices
>> thermal: core: Set parent device in thermal_of_cooling_device_register()
>> ACPI: processor: Stop creating "device" sysfs link
> That link is not to the cooling devices' parent, but to the ACPI
> device object (a struct acpi_device) that corresponds to the parent.
> The parent of the cooling device should be the processor device, not
> its ACPI companion, so I'm not sure why there would be a conflict.
From the perspective of the Linux device core, a parent device does not have to be
a "physical" device. In the case of the ACPI processor driver, the ACPI device is used,
so the cooling device registered by said driver belongs to the ACPI device. I agree
that using the Linux processor device would make more sense, but this will require
changes inside the ACPI processor driver.
As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks
(the one created by the ACPI processor driver and the one created by the device core) will
be created in the same directory (which is the directory of the cooling device).
>> ACPI: fan: Stop creating "device" sysfs link
>> ACPI: video: Stop creating "device" sysfs link
> Analogously in the above two cases AFAICS.
>
> The parent of a cooling device should be a "physical" device object,
> like a platform device or a PCI device or similar, not a struct
> acpi_device (which in fact is not a device even).
From the perspective of the Linux device core, a ACPI device is a perfectly valid device.
I agree that using a platform device or PCI device is better, but this already happens
inside the ACPI fan driver (platform device).
Only the ACPI video driver created a "device" sysfs link that points to the ACPI device
instead of the PCI device. I just noticed that i accidentally changed this by using the
PCI device as the parent device for the cooling device.
If you want then we can keep this change.
>> thermal: core: Set parent device in thermal_cooling_device_register()
>> ACPI: thermal: Stop creating "device" sysfs link
> And this link is to the struct acpi_device representing the thermal zone itself.
Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to
ACPI devices. Because of this all (thermal zone) devices created by an instance of
said driver are descendants of the ACPI device said instance is bound to.
We can of course convert the ACPI thermal zone driver into a platform driver, but
this would be a separate patch series.
>> thermal: core: Allow setting the parent device of thermal zone devices
> I'm not sure if this is a good idea, at least until it is clear what
> the role of a thermal zone parent device should be.
Take a look at my explanation with the Intel Wifi driver.
Thanks,
Armin Wolf
Powered by blists - more mailing lists