lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <866a6d6b-c75e-26d7-a323-f8840c1228c3@roeck-us.net>
Date:   Thu, 8 Jun 2023 11:03:15 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     "Kannan, Baski" <Baski.Kannan@....com>
Cc:     "Moger, Babu" <Babu.Moger@....com>,
        "clemens@...isch.de" <clemens@...isch.de>,
        "jdelvare@...e.com" <jdelvare@...e.com>,
        "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Ramayanam, Pavan" <Pavan.Ramayanam@....com>
Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures

On 6/8/23 10:09, Kannan, Baski wrote:
> [AMD Official Use Only - General]
> 
> The patch you have mentioned, aef17ca12719, sounds like a work-around for a problem found in some Ryzen Threadripper processors.
> If I understand correctly, this work-around (aef17ca12719) has been provided as a blanket fix for all the processors.
> 

Due to lack of better knowledge and understanding, yes. See
https://github.com/lm-sensors/lm-sensors/issues/70. That doesn't
mean that a blanket revert would be appropriate.

> The Industrial Processor in question is the Epyc3k i3255.
> AMD Family 17h (boot_cpu_data.x86)
> AMD model 00h - 0fh (boot_cpu_data.x86_model)
> Model Name - contains string "3255"
> 
> It supports temperature ranging from -40 degree Celsius to 105 deg Celsius.
> We have customers' machines running at -20 deg Celsius. They require that the correct temperature be passed to their tools.
> 

We have two options: Either limit the workaround to the list of processors
which may be affected by the original problem, or do not apply it to
processors which are known to _not_ be affected by the problem. Either
can easily be implemented by adding a flag to struct k10temp_data and
setting it in the probe function.

No one outside AMD knows which processors may or may not be affected
by the original problem. It was reported on 1950X at the time, but
it may exist on all processors with the ability to set Sense MI Skew
(and possibly Sense MI Offset), whatever that is. With that in mind,
the fix will have to be provided by AMD.

Guenter

> -----Original Message-----
> From: Guenter Roeck <groeck7@...il.com> On Behalf Of Guenter Roeck
> Sent: Thursday, June 8, 2023 8:52 AM
> To: Kannan, Baski <Baski.Kannan@....com>
> Cc: Moger, Babu <Babu.Moger@....com>; clemens@...isch.de; jdelvare@...e.com; linux-hwmon@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Tue, May 23, 2023 at 02:46:46PM -0700, Guenter Roeck wrote:
>> On Tue, May 23, 2023 at 03:49:32PM -0500, Baskaran Kannan wrote:
>>> Currently, the tctl and die temperatures are rounded off to zero if
>>> they are less than 0. There are industrial processors which work
>>> below zero.
>>
>> This was introduced with commit aef17ca12719 ("hwmon: (k10temp) Only
>> apply temperature offset if result is positive"). This patch would
>> effecively revert that change. Given the reason for introducing it I
>> am not convinced that it is a good idea to unconditionally revert it.
>>
> 
> Any comments ? I am not inclined to accept this patch as-is. What are the industrial processors ? Is there a means to detect them ?
> 
> Guenter
> 
>> Guenter
>>
>>>
>>> To display the correct temperature remove the rounding off.
>>>
>>> Signed-off-by: Baskaran Kannan <Baski.Kannan@....com>
>>> ---
>>>   drivers/hwmon/k10temp.c | 4 ----
>>>   1 file changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index
>>> 7b177b9fbb09..489ad0b1bc74 100644
>>> --- a/drivers/hwmon/k10temp.c
>>> +++ b/drivers/hwmon/k10temp.c
>>> @@ -204,13 +204,9 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>>>              switch (channel) {
>>>              case 0:         /* Tctl */
>>>                      *val = get_raw_temp(data);
>>> -                   if (*val < 0)
>>> -                           *val = 0;
>>>                      break;
>>>              case 1:         /* Tdie */
>>>                      *val = get_raw_temp(data) - data->temp_offset;
>>> -                   if (*val < 0)
>>> -                           *val = 0;
>>>                      break;
>>>              case 2 ... 13:          /* Tccd{1-12} */
>>>                      amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
>>> --
>>> 2.25.1
>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ