lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <453e0df5-416b-476e-9629-c40534ecfb72@tuxedocomputers.com>
Date: Mon, 2 Dec 2024 15:37:40 +0100
From: Werner Sembach <wse@...edocomputers.com>
To: Hans de Goede <hdegoede@...hat.com>, Armin Wolf <W_Armin@....de>,
 ilpo.jarvinen@...ux.intel.com
Cc: linux-kernel@...r.kernel.org, platform-driver-x86@...r.kernel.org
Subject: Re: [RFC PATCH 0/1] platform: x86: tuxi: Implement TUXEDO TUXI ACPI
 TFAN as thermal subsystem

Hi Hans, hi Armin,

Am 02.12.24 um 10:40 schrieb Hans de Goede:
> Hi Armin, Werner,
>
> On 1-Dec-24 6:58 PM, Armin Wolf wrote:
>> Am 27.11.24 um 12:21 schrieb Werner Sembach:
>>
>>> Hi,
>>>
>>> Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@tuxedocomputers.com/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@roeck-us.net/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall:
>>>
>>> As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor.
>>>
>>> So my ideas/questions:
>>> - Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor
>>> - Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied
>>> - How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor.
>>> - Can I somehow restrict userspace from using certain govenors?
>>> - I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then?
>> Hi,
>>
>> i am having little experience with the thermal subsystem, but i suggest that policy decisions like "min_fanspeed_percent" should either:
>>
>> - come from the hardware/firmware itself
>> - come from userspace
>>
>> Effectively this driver tries to enforce a Tuxedo-specific policy that is not directly based on hardware limits. The book "Linux Device Drivers"
>> says:
>>
>>      "the role of a device driver is providing///mechanism/, not/policy/."

But it is a hardware limit, as the hardware will wear and might break if not run 
within its limits.

As this driver is for already released hardware, a firmware fix is not possible.

>>
>> Furthermore:
>>
>>      "When/writing/ drivers, a programmer should pay particular attention to this fundamental concept: write kernel code to access the hardware,
>>       but don't force particular policies on the user, since different users have different needs. The driver should deal with making the hardware
>>       available, leaving all the issues about/how/ to use the hardware to the applications. A driver, then, is flexible if it offers access to the
>>       hardware capabilities without adding constraints."
>>
>> The issue is that the Tuxedo-specific policy is not directly connected with the hardware. Some hardware might need a bigger minimum fan speed than
>> other hardware for example.

Currently this driver is for exactly 2 TUXEDO devices (Sirius 16 gen 1 & 2).

Future devices can have different limits provided by the ACPI or a list in the 
driver.

>>
>> The hardware (or rather firmware in this case) should communicate those constraints to the linux driver so that we do not need to rely on random
>> temperatures for hardware protection.
But then the driver still needs to enforce them.
>>
>> This ACPI interface however basically provides us with a hwmon interface and Tuxedo now wants the kernel to enforce their policy on it. I suspect that
>> happens for warranty reasons, right?

The ACPI interface is only present on TUXEDO devices (TUXI stands for TUXEDO 
Interface and TFAN for TUXEDO FAN). And we want the policy for the devices to 
live longer, not only for warranty.

TUXI meant as a simple getter and setter interface for the EC. We don't have 
access to the EC firmware source so more complex stuff needs to be done 
somewhere else, i.e. the driver.

>>
>> Maybe there is a way to enforce this policy through userspace applications?
That's no enforcement as userspace can just write the sysfs directly.
> An important role of device-drivers is also to avoid driving hardware outside
> its valid specifications. E.g. charger drivers in the power_supply subsystem
> have constant_charge_current and constant_charge_voltage settings which
> give a max charging speed in Amperes and a max charging Voltage (typically
> 4.2V for single Li-xxx cell batteries).
>
> Setting these too high can be very dangerous leading to batteries catching
> fire / exploding. so naturally the drivers also have and enforce
> constant_charge_current_max and constant_charge_voltage_max sysfs attributes
> and the values there do not reflect the maximum what the charger-chip can
> handle, but the maximum which the manufacturer has specified for the battery.
>
> This info can e.g. come from devicetree but if not present then the values
> applied by the firmware at boot should be read back and used not only
> as active, but also as max values.
>
> I believe what Werner wants is pretty much the same thing, allow userspace
> to provide its own fan management, but have the kernel monitor the temperature
> and if the temperature goes say above 80° Celsius, enforce a min fan-speed of
> 30% and at 90° enforce a fan-speed of say 70%.
>
> Yes the exact values are a for of policy, but this is a policy to protect
> hw from getting damaged and as such is fine to have in the kernel.
>
> As for how to implement this with the thermal subsystem (which indeed looks
> like it might be the way to go) I'm not familiar enough with the thermal
> subsystem either.
>
> Werner, I suggest that you send an email to the thermal subsystem maintainers
> and ask them about this. It would be good to start with explaining your
> problem before asking the questions. I would leave out details about the ACPI
> interface. Just say you have a temp-sensor + pwm to control fan speed combination
> where you want to allow userspace control while having the kernel monitor
> things and have the kernel overriding userspace's fan speed setting when
> things get too hot, to deal with e.g. the userspace process crashing at
> a low fan speed setting.
Ack
>
> Regards,
>
> Hans
>
Best regards,

Werner


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ