[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <866a6d6b-c75e-26d7-a323-f8840c1228c3@roeck-us.net>
Date: Thu, 8 Jun 2023 11:03:15 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: "Kannan, Baski" <Baski.Kannan@....com>
Cc: "Moger, Babu" <Babu.Moger@....com>,
"clemens@...isch.de" <clemens@...isch.de>,
"jdelvare@...e.com" <jdelvare@...e.com>,
"linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Ramayanam, Pavan" <Pavan.Ramayanam@....com>
Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures
On 6/8/23 10:09, Kannan, Baski wrote:
> [AMD Official Use Only - General]
>
> The patch you have mentioned, aef17ca12719, sounds like a work-around for a problem found in some Ryzen Threadripper processors.
> If I understand correctly, this work-around (aef17ca12719) has been provided as a blanket fix for all the processors.
>
Due to lack of better knowledge and understanding, yes. See
https://github.com/lm-sensors/lm-sensors/issues/70. That doesn't
mean that a blanket revert would be appropriate.
> The Industrial Processor in question is the Epyc3k i3255.
> AMD Family 17h (boot_cpu_data.x86)
> AMD model 00h - 0fh (boot_cpu_data.x86_model)
> Model Name - contains string "3255"
>
> It supports temperature ranging from -40 degree Celsius to 105 deg Celsius.
> We have customers' machines running at -20 deg Celsius. They require that the correct temperature be passed to their tools.
>
We have two options: Either limit the workaround to the list of processors
which may be affected by the original problem, or do not apply it to
processors which are known to _not_ be affected by the problem. Either
can easily be implemented by adding a flag to struct k10temp_data and
setting it in the probe function.
No one outside AMD knows which processors may or may not be affected
by the original problem. It was reported on 1950X at the time, but
it may exist on all processors with the ability to set Sense MI Skew
(and possibly Sense MI Offset), whatever that is. With that in mind,
the fix will have to be provided by AMD.
Guenter
> -----Original Message-----
> From: Guenter Roeck <groeck7@...il.com> On Behalf Of Guenter Roeck
> Sent: Thursday, June 8, 2023 8:52 AM
> To: Kannan, Baski <Baski.Kannan@....com>
> Cc: Moger, Babu <Babu.Moger@....com>; clemens@...isch.de; jdelvare@...e.com; linux-hwmon@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Tue, May 23, 2023 at 02:46:46PM -0700, Guenter Roeck wrote:
>> On Tue, May 23, 2023 at 03:49:32PM -0500, Baskaran Kannan wrote:
>>> Currently, the tctl and die temperatures are rounded off to zero if
>>> they are less than 0. There are industrial processors which work
>>> below zero.
>>
>> This was introduced with commit aef17ca12719 ("hwmon: (k10temp) Only
>> apply temperature offset if result is positive"). This patch would
>> effecively revert that change. Given the reason for introducing it I
>> am not convinced that it is a good idea to unconditionally revert it.
>>
>
> Any comments ? I am not inclined to accept this patch as-is. What are the industrial processors ? Is there a means to detect them ?
>
> Guenter
>
>> Guenter
>>
>>>
>>> To display the correct temperature remove the rounding off.
>>>
>>> Signed-off-by: Baskaran Kannan <Baski.Kannan@....com>
>>> ---
>>> drivers/hwmon/k10temp.c | 4 ----
>>> 1 file changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index
>>> 7b177b9fbb09..489ad0b1bc74 100644
>>> --- a/drivers/hwmon/k10temp.c
>>> +++ b/drivers/hwmon/k10temp.c
>>> @@ -204,13 +204,9 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>>> switch (channel) {
>>> case 0: /* Tctl */
>>> *val = get_raw_temp(data);
>>> - if (*val < 0)
>>> - *val = 0;
>>> break;
>>> case 1: /* Tdie */
>>> *val = get_raw_temp(data) - data->temp_offset;
>>> - if (*val < 0)
>>> - *val = 0;
>>> break;
>>> case 2 ... 13: /* Tccd{1-12} */
>>> amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
>>> --
>>> 2.25.1
>>>
Powered by blists - more mailing lists