[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad2e303d-1b13-d575-d58c-f4785e71d6e7@linux.intel.com>
Date: Fri, 14 Mar 2025 15:59:15 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Werner Sembach <wse@...edocomputers.com>
cc: Hans de Goede <hdegoede@...hat.com>, Jean Delvare <jdelvare@...e.com>,
Guenter Roeck <linux@...ck-us.net>, LKML <linux-kernel@...r.kernel.org>,
platform-driver-x86@...r.kernel.org, linux-hwmon@...r.kernel.org
Subject: Re: [PATCH v2] platform/x86/tuxedo: Implement TUXEDO TUXI ACPI TFAN
via hwmon
On Fri, 14 Mar 2025, Werner Sembach wrote:
> Sorry, resend, mail client did html message by accident
Np.
> Am 14.03.25 um 11:05 schrieb Ilpo Järvinen:
> > > > > + S32_MAX : (retval - TUXI_FW_TEMP_OFFSET) *
> > > > > 100;
> > > > Is the math wrong, as you do retval - TUXI_FW_TEMP_OFFSET before
> > > > multiplying?
> > > No, retval is in 10th of °K (but the last number is always 0) so is
> > > TUXI_FW_TEMP_OFFSET which is there to convert it to 10th of °C, the * 100
> > > is
> > > then to bring it in line with hwmon wanting to output milli degrees
> > So is result of S32_MAX correct when retval is 21474837?
> >
> > (21474837-2730)*100
> > 2147210700
> > 2^31-1
> > 2147483647
> >
> > 2147210700 would have been representable but the upper bound is
> > still applied (the value might be large enough to not have practical
> > significance but to me the code looks still illogical why it applies the
> > bound prematurely).
>
> Yeah my though was: this check is only here to catch the firmware doing some
> crazy stuff and sending highly unrealistic values, so gifting a small bit of
> the available range away doesn't matter
But it does matter as you could note. I stumbled on the logic which didn't
look right while reviewing. You even claimed afterwards is not wrong when
I raised this. :-/
Please just correct the logic so it makes sense to the code reader, there
seems to be no well justified reason to keep the illogical code even if
the practical impact is very low. It's probably done the way it is only
because the variable types are what they are so you couldn't do the
subtraction like I proposed ;-). At minimum you'd need to add a comment to
warn about the inconsistency at which point rewriting to correct logic is
already way simpler.
> > I see you already sent another version, it would have been prudent to wait
> > a bit longer as you contested some of the comments so you could have seen
> > my replies before sending the next version.
>
> I'm sorry. I just wanted to show that I'm iterating as I wait for the reply if
> the design with the periodic safeguard is acceptable. If that's gets rejected
> this driver must be rewritten anyway.
Kernel development is not a sprint. It's better to avoid sending versions
unnecessarily, a day or two isn't worth it when compared with ending up
into people's low priority bin which will inevitably happen when the
version counter starts to grow beyond v5-6.
I (and likely others too) appreciate if they don't have to waste review
cycles on something that is not "complete" because we have to look at the
completed one later too. Maintainers work in good faith that developers
are simply improving their patches (or working on some other great
improvements to the kernel :-)) while nothing seemingly happens for a
while. There's no need to prove that something is going on just for the
sake of proving.
Obviously RFC patches are still fine to ask specific questions about
something, but that's not about proving progress (in fact, RFC patches are
more about being "stuck" than about making progress).
> > > > Shouldn't it be like this:
> > > >
> > > > retval -= TUXI_FW_TEMP_OFFSET;
> > > > *val = min(retval * 100, (unsigned long long)S32_MAX);
> > > As retval is unsigned this would not work with (theoretical) negative °C.
> > So your code relies on implicit type conversion in this: (retval -
> > TUXI_FW_TEMP_OFFSET) ?
>
> I can add an explicit cast, np.
>
> [snip]
>
> > > > > + }
> > > > > + if (temp >= temp_high)
> > > > > + ret = i;
> > Now that I reread things, is this also incorrect, as "i" is at the
> > terminator entry at this point?
>
> Yes that's intentional, the 3 entries in the array open up 4 ranges:
>
> lower then 1st entry i=0, between 1st and 2nd entry i=1, 2nd and 3rd i=2,
> higher then 3rd i=3 (the value that terminates the for loop)
I didn't realize that. To me { } looks just an terminating entry. So
what's the min_speed going to be for that last entry since it's
initialized to 0?
Oh, I see you're taking .min_speed from temp_levels[temp_level - 1] which
I don't like either. You have a "state" and then store min_speed for the
state into other index inside the array?!?
> > > > > +
> > > > > + temp = retval > S32_MAX / 100 ?
> > > > > + S32_MAX : (retval - TUXI_FW_TEMP_OFFSET) *
> > > > > 100;
> > > > Same math issue comment as above.
> > > >
> > > > Why is the read+conversion code duplicated into two places?
> > > because here it is with special error handling and didn't thought about an
> > > own
> > > function for a defacto 2 liner
> > A function that does read+conversion would be 6-8 lines with the error
> > handling.
>
> I can add it.
>
> [snip]
>
> Thanks for the code review again.
>
>
> Last but not least: As already mentioned, I still wonder if the design with
> the periodic safeguard is ok or not or?
I'm not sure if fully understand Daniel's suggestion [1] as it
doesn't specify who/what is sending that notification to the thermal
engine.
[1] https://lore.kernel.org/all/286f5efc-cd15-4e0b-bec2-2e9bbb93dd37@linaro.org/#t
When it comes to your own concerns, I'm not exactly buying the argument
that userspace can do dangerous things. Yeah, it can shoot one's own
foot, no doubt, such as unloading this driver and there goes your periodic
safeguards. If the argument would be that userspace fails to respond (in
time), I would have less trouble in accepting that argument.
--
i.
Powered by blists - more mailing lists