linux-kernel - Re: [PATCH v2] platform/x86/tuxedo: Implement TUXEDO TUXI ACPI TFAN via hwmon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad2e303d-1b13-d575-d58c-f4785e71d6e7@linux.intel.com>
Date: Fri, 14 Mar 2025 15:59:15 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Werner Sembach <wse@...edocomputers.com>
cc: Hans de Goede <hdegoede@...hat.com>, Jean Delvare <jdelvare@...e.com>, 
    Guenter Roeck <linux@...ck-us.net>, LKML <linux-kernel@...r.kernel.org>, 
    platform-driver-x86@...r.kernel.org, linux-hwmon@...r.kernel.org
Subject: Re: [PATCH v2] platform/x86/tuxedo: Implement TUXEDO TUXI ACPI TFAN
 via hwmon

On Fri, 14 Mar 2025, Werner Sembach wrote:

> Sorry, resend, mail client did html message by accident

Np.

> Am 14.03.25 um 11:05 schrieb Ilpo Järvinen:

> > > > > +			S32_MAX : (retval - TUXI_FW_TEMP_OFFSET) *
> > > > > 100;
> > > > Is the math wrong, as you do retval - TUXI_FW_TEMP_OFFSET before
> > > > multiplying?
> > > No, retval is in 10th of °K (but the last number is always 0) so is
> > > TUXI_FW_TEMP_OFFSET which is there to convert it to 10th of °C, the * 100
> > > is
> > > then to bring it in line with hwmon wanting to output milli degrees
> > So is result of S32_MAX correct when retval is 21474837?
> > 
> > (21474837-2730)*100
> > 2147210700
> > 2^31-1
> > 2147483647
> > 
> > 2147210700 would have been representable but the upper bound is
> > still applied (the value might be large enough to not have practical
> > significance but to me the code looks still illogical why it applies the
> > bound prematurely).
>
> Yeah my though was: this check is only here to catch the firmware doing some
> crazy stuff and sending highly unrealistic values, so gifting a small bit of
> the available range away doesn't matter

But it does matter as you could note. I stumbled on the logic which didn't 
look right while reviewing. You even claimed afterwards is not wrong when 
I raised this. :-/

Please just correct the logic so it makes sense to the code reader, there 
seems to be no well justified reason to keep the illogical code even if 
the practical impact is very low. It's probably done the way it is only 
because the variable types are what they are so you couldn't do the 
subtraction like I proposed ;-). At minimum you'd need to add a comment to 
warn about the inconsistency at which point rewriting to correct logic is 
already way simpler.

> > I see you already sent another version, it would have been prudent to wait
> > a bit longer as you contested some of the comments so you could have seen
> > my replies before sending the next version.
>
> I'm sorry. I just wanted to show that I'm iterating as I wait for the reply if
> the design with the periodic safeguard is acceptable. If that's gets rejected
> this driver must be rewritten anyway.

Kernel development is not a sprint. It's better to avoid sending versions 
unnecessarily, a day or two isn't worth it when compared with ending up 
into people's low priority bin which will inevitably happen when the 
version counter starts to grow beyond v5-6.

I (and likely others too) appreciate if they don't have to waste review 
cycles on something that is not "complete" because we have to look at the 
completed one later too. Maintainers work in good faith that developers 
are simply improving their patches (or working on some other great 
improvements to the kernel :-)) while nothing seemingly happens for a 
while. There's no need to prove that something is going on just for the 
sake of proving.

Obviously RFC patches are still fine to ask specific questions about 
something, but that's not about proving progress (in fact, RFC patches are 
more about being "stuck" than about making progress).

> > > > Shouldn't it be like this:
> > > > 
> > > > 		retval -= TUXI_FW_TEMP_OFFSET;
> > > > 		*val = min(retval * 100, (unsigned long long)S32_MAX);
> > > As retval is unsigned this would not work with (theoretical) negative °C.
> > So your code relies on implicit type conversion in this: (retval -
> > TUXI_FW_TEMP_OFFSET) ?
> 
> I can add an explicit cast, np.
> 
> [snip]
> 
> > > > > +	}
> > > > > +	if (temp >= temp_high)
> > > > > +		ret = i;
> > Now that I reread things, is this also incorrect, as "i" is at the
> > terminator entry at this point?
> 
> Yes that's intentional, the 3 entries in the array open up 4 ranges:
> 
> lower then 1st entry i=0, between 1st and 2nd entry i=1, 2nd and 3rd i=2,
> higher then 3rd i=3 (the value that terminates the for loop)

I didn't realize that. To me { } looks just an terminating entry. So 
what's the min_speed going to be for that last entry since it's 
initialized to 0?

Oh, I see you're taking .min_speed from temp_levels[temp_level - 1] which
I don't like either. You have a "state" and then store min_speed for the 
state into other index inside the array?!?

> > > > > +
> > > > > +		temp = retval > S32_MAX / 100 ?
> > > > > +			S32_MAX : (retval - TUXI_FW_TEMP_OFFSET) *
> > > > > 100;
> > > > Same math issue comment as above.
> > > > 
> > > > Why is the read+conversion code duplicated into two places?
> > > because here it is with special error handling and didn't thought about an
> > > own
> > > function for a defacto 2 liner
> > A function that does read+conversion would be 6-8 lines with the error
> > handling.
> 
> I can add it.
> 
> [snip]
> 
> Thanks for the code review again.
> 
> 
> Last but not least: As already mentioned, I still wonder if the design with
> the periodic safeguard is ok or not or?

I'm not sure if fully understand Daniel's suggestion [1] as it 
doesn't specify who/what is sending that notification to the thermal 
engine.

[1] https://lore.kernel.org/all/286f5efc-cd15-4e0b-bec2-2e9bbb93dd37@linaro.org/#t


When it comes to your own concerns, I'm not exactly buying the argument 
that userspace can do dangerous things. Yeah, it can shoot one's own 
foot, no doubt, such as unloading this driver and there goes your periodic 
safeguards. If the argument would be that userspace fails to respond (in 
time), I would have less trouble in accepting that argument.

-- 
 i.