lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1dc793cd-d11d-441a-a734-465eb4872b2a@roeck-us.net>
Date: Mon, 27 Jan 2025 10:39:44 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Paul Fertser <fercerpav@...il.com>
Cc: "Winiarska, Iwona" <iwona.winiarska@...el.com>,
 "jae.hyun.yoo@...ux.intel.com" <jae.hyun.yoo@...ux.intel.com>,
 "Rudolph, Patrick" <patrick.rudolph@...ements.com>,
 "pierre-louis.bossart@...ux.dev" <pierre-louis.bossart@...ux.dev>,
 "Solanki, Naresh" <naresh.solanki@...ements.com>,
 "jdelvare@...e.com" <jdelvare@...e.com>,
 "fr0st61te@...il.com" <fr0st61te@...il.com>,
 "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
 "stable@...r.kernel.org" <stable@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "openbmc@...ts.ozlabs.org" <openbmc@...ts.ozlabs.org>,
 "joel@....id.au" <joel@....id.au>
Subject: Re: [PATCH] hwmon: (peci/dimmtemp) Do not provide fake thresholds
 data

On 1/27/25 10:30, Paul Fertser wrote:
> Hi Guenter,
> 
> On Mon, Jan 27, 2025 at 09:29:39AM -0800, Guenter Roeck wrote:
>> On 1/27/25 08:40, Winiarska, Iwona wrote:
>>> On Thu, 2025-01-23 at 15:20 +0300, Paul Fertser wrote:
>>>> When an Icelake or Sapphire Rapids CPU isn't providing the maximum and
>>>> critical thresholds for particular DIMM the driver should return an
>>>> error to the userspace instead of giving it stale (best case) or wrong
>>>> (the structure contains all zeros after kzalloc() call) data.
>>>>
>>>> The issue can be reproduced by binding the peci driver while the host is
>>>> fully booted and idle, this makes PECI interaction unreliable enough.
>>>>
>>>> Fixes: 73bc1b885dae ("hwmon: peci: Add dimmtemp driver")
>>>> Fixes: 621995b6d795 ("hwmon: (peci/dimmtemp) Add Sapphire Rapids support")
>>>> Cc: stable@...r.kernel.org
>>>> Signed-off-by: Paul Fertser <fercerpav@...il.com>
>>>
>>> Hi!
>>>
>>> Thank you for the patch.
>>> Did you have a chance to test it with OpenBMC dbus-sensors?
>>> In general, the change looks okay to me, but since it modifies the behavior
>>> (applications will need to handle this, and returning an error will happen more
>>> often) we need to confirm that it does not cause any regressions for userspace.
>>>
>>
>> I would also like to understand if the error is temporary or permanent.
>> If it is permanent, the attributes should not be created in the first
>> place. It does not make sense to have limit attributes which always report
>> -ENODATA.
> 
> The error is temporary. The underlying reason is that when host CPUs
> go to deep enough idle sleep state (probably C6) they stop responding
> to PECI requests from BMC. Once something starts running the CPU
> leaves C6 and starts responding and all the temperature data
> (including the thresholds) becomes available again.
> 

Thanks.

Next question: Is there evidence that the thresholds change while the CPU
is in a deep sleep state (or, in other words, that they are indeed stale) ?
Because if not it would be (much) better to only report -ENODATA if the
thresholds are uninitialized, and it would be even better than that if the
limits are read during initialization (and not updated at all) if they do
not change dynamically.

Guenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ