lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 8 Apr 2021 16:54:51 +0200
From:   Thomas Weißschuh <linux@...ssschuh.net>
To:     Hans de Goede <hdegoede@...hat.com>
Cc:     Jean Delvare <jdelvare@...e.com>,
        Guenter Roeck <linux@...ck-us.net>,
        platform-driver-x86@...r.kernel.org,
        Mark Gross <mgross@...ux.intel.com>,
        linux-kernel@...r.kernel.org,
        Barnabás Pőcze <pobrn@...tonmail.com>,
        Matthew Garrett <mjg59@...f.ucam.org>
Subject: Re: [PATCH v2] platform/x86: add Gigabyte WMI temperature driver

Hi Hans,

On Do, 2021-04-08T11:36+0200, Hans de Goede wrote:
> On 4/7/21 9:43 PM, Thomas Weißschuh wrote:
> > Hi Hans,
> > 
> > On Mi, 2021-04-07T17:54+0200, Hans de Goede wrote:
> >> Thank you for your new driver and thank you for the quick respin
> >> addressing Barnabás' request to make it a WMI driver.
> >>
> >> The code looks good, so merging this should be a no-brainer,
> >> yet I'm not sure if I should merge this driver as-is, let me
> >> explain.
> > 
> > thanks for the encouraging words.
> > 
> >> The problem is that I assume that this is based on reverse-engineering?
> > 
> > Yes, it is completely reverse-engineered.
> > Essentially I stumbled upon Matthews comment at
> > https://bugzilla.kernel.org/show_bug.cgi?id=204807#c37 and went from there.
> > 
> >> We have some mixes experiences with reverse-engineered WMI drivers,
> >> sometimes a feature is not supported yet the wmi_evaluate_method()
> >> call happily succeeds. One example of this causing trouble is:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1797d588af15174d4a4e7159dac8c800538e4f8c
> > 
> > There actually are reports of recent, similar mainboards with recent firmware and
> > similar sensor chips that do not support the temperature query.
> > (https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/3 and
> > https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/2)
> > 
> > Unfortunately for unknown WMI queries the firmware does not return any value.
> > This ends up as an ACPI integer with value 0 on the driver side.
> > (Which I could not yet find documentation for if that is expected)
> > In the current version of the driver EIO is returned for 0 values which
> > get translated to N/A by lm-sensors.
> > 
> >> At a minimum I think your driver should check in its
> >> probe function that
> >>
> >> gigabyte_wmi_query_integer(GIGABYTE_WMI_TEMPERATURE_QUERY, ...)
> >>
> >> actually succeeds on the machine the driver is running on chances
> >> are that Gigabyte has been using the DEADBEEF-2001-0000-00A0-C90629100000
> >> GUID for ages, where as the 0x125 value for GIGABYTE_WMI_TEMPERATURE_QUERY
> >> suggests that this is a pretty new API.
> > 
> > Would it be enough to probe all six sensors and check if all return 0?
> 
> I think that failing the probe with -ENODEV, with a dev_info() explaining why when
> all six sensors return 0 would be good yes, that should also fix those 2
> issues on https://github.com/t-8ch/linux-gigabyte-wmi-driver/.

I added such a validation step.

> >> It would be good if you can see if you can find some DSDT-s for older
> >> gigabyte motherboards attached to various distro's bug-tracking systems
> >> or forum posts and see how those respond to an unknown gigabyte_wmi_commandtype.
> > 
> > Will do.
> 
> Since you alreayd have bugreports of boards where this does not work,
> please don't spend too much time on this. I guess those older DSDT-s will
> also just return an integer with value 0.

Ok.

> >> Another open question to make sure this driver is suitable
> >> as a generic driver (and does not misbehave) is how to figure out
> >> how many temperature sensors there actually are.
> > 
> > So far I could not find out how to query this from the firmware.
> > The IT8688 chip can report the state of each sensor but that is not exposed by
> > the firmware.
> > But even the state information from the IT8688 is not accurate as is.
> > One of the sensors that is reported as being active (directly via it87) on my
> > machine always reports -55°C (yes, negative).
> 
> Ok.
> 
> >> Perhaps the WMI interface returns an error when you query an out-of-range
> >> temperature channel?
> > 
> > Also "0" as mentioned above.
> 
> Hmm, so maybe this can be used to limit the amount of reported temperature
> sensors, IOW if sensors 5 and 6 report 0, only register 4 sensors ?

So far the 0-returning sensors have not been at the end of the list but in the
middle. Is it worth building logic to properly probe a bitmask of useful
sensors?

> >> One option here might be to add a DMI matching table and only load on
> >> systems on that table for now. That table could then perhaps also provide
> >> labels for each of the temperature channels, which is something which
> >> would be nice to have regardless of my worries about how well this driver
> >> will work on motherboards on which it has not been tested.
> > 
> > I am collecting reports for working motherboards at
> > https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1 .
> 
> Good, you should probably ask reporters there to provide the output of:
> 
> grep . /sys/class/dmi/id/* 2> /dev/null

I added a DMI-based whitelist and asked users to submit their DMI information.

> Ran as a normal user (so that the serial-numbers will be skipped) so that
> you will have DMI strings to match on if you decide to go that route.

The serials seem not to be too critical on these boards:

/sys/class/dmi/id/board_serial:Default string
/sys/class/dmi/id/chassis_serial:Default string
/sys/class/dmi/id/product_serial:Default string

> > As mentioned in the initial patch submission there would be different ways to
> > access this information firmware:
> > 
> > * Directly call the underlying ACPI methods (these are present in all so far
> >   observed firmwares, even if not exposed via WMI).
> > * Directly access the ACPI IndexField representing the it87 chip.
> > * Directly access the it87 registers while holding the relevant locks via ACPI.
> > 
> > I assume all of those mechanisms have no place in a proper kernel driver but
> > would like to get your opinion on it.
> 
> Actually the "Directly access the it87 registers" option is potentially interesting
> since it will allow using the it87 driver which gives a lot more features.
> 
> I actually wrote a rough outline of how something like this could work here:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=204807#c47

I must have overread this one, but yes that's also what I had in mind.

> Note I'm not sure if that is the right approach, but it definitely is an
> option. It seems that this one might also solve the X470-AORUS-ULTRA-GAMING
> case (https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/2)
> 
> Hopefully the direct-register ACPI/WMI access methods will also allow
> reading the super-io vendor and product ids so that we can be reasonably
> sure that we are not loading the wrong driver on a board.
> 
> OTOH the WMI-temp method approach may also work on boards where the sensors
> (or some of the sensors) are not supported.
> 
> I'm afraid there is no obviously correct answer here. If you feel like it
> experimenting with the "Directly access the it87 registers" option would certainly
> be interesting IMHO.
> 
> It might be good to get hwmon subsystems maintainer's opinion on this
> before sinking a lot of time into this though (added to the Cc).

Sounds good.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ