lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Dec 2020 00:08:49 -0600
From:   Wei Huang <wei.huang2@....com>
To:     Gabriel C <nix.or.die@...glemail.com>
Cc:     Guenter Roeck <linux@...ck-us.net>, linux-hwmon@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: k10temp: ZEN3 readings are broken



On 12/21/20 11:09 PM, Gabriel C wrote:
> Am Di., 22. Dez. 2020 um 05:33 Uhr schrieb Wei Huang <wei.huang2@....com>:
>>
>>
>>
>> On 12/21/20 9:58 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On 12/21/20 5:45 PM, Gabriel C wrote:
>>>> Hello Guenter,
>>>>
>>>> while trying to add ZEN3 support for zenpower out of tree modules, I find out
>>>> the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
>>>>
>>>> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
>>>>
>>>> case 0x0 ... 0x1:       /* Zen3 */
>>>>
>>>> however, this is wrong, we look for a model which is 0x21 for ZEN3,
>>>> these seem to
>>>> be steppings?
>>
>> These are model numbers for server CPUs. I believe 0x21 is for desktop
>> CPUs. In other words, current upstream code doesn't support your CPUs.
>> You are welcomed to add support for 0x21, but it is wrong to remove
>> support for 0x00/0x01.
> 
> I figured that myself after seeing what was committed to amd_energy driver.
> Would be better you as the author of the patch to have a better commit
> message to start with.
> 
> 
> commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e
> Author: Wei Huang <wei.huang2@....com>
> Date:   Mon Sep 14 15:07:15 2020 -0500
> 
>     hwmon: (k10temp) Add support for Zen3 CPUs
> ....
> 
> Which you didn't. That should read:
> 
> "Added support for NOT yet released SP3 ZEN3 CPU"
> 
> Right?

Yes. This subject line can be more clear with something like "Add 
support for Zen3 Server and TR CPUs".

> 
>>
>>>>
>>>> Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
>>>> the model.
>>>>
>>>> Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
>>>> also ) that should be:
>>>>
>>>> PLANE0  (ZEN_SVI_BASE + 0x10)
>>>> PLANE1  (ZEN_SVI_BASE + 0xc)
>>
>> Same problem here with model 0x71. 0x31 is for server CPUs.
> 
> Yes, is why I split both in my 'guess what the eff is this about' patch.
> 
> 0x31 is TR 3000/ Sp3 ZEN2 , while 0x71 is ZEN2 Desktop.
>>
>>>>
>>>> Which is the same as for ZEN2 >= 0x71. Since this is not really
>>>> documented and I have some
>>>> confirmations of these numbers from *somewhere* :-) I created a demo patch only.
>>>>
>>>> I would like AMD people to really have a look at the driver and
>>>> confirm the changes, since
>>>> getting information from *somewhere*,  dosen't mean they are 100%
>>>> correct. However, the driver
>>>> is working with these changes.
>>>>
>>>> In any way the model needs changing to 0x21 even if we let the other
>>>> readings broken.
>>>>
>>>> There is my demo patch:
>>>>
>>>> https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch
>>
>> For family 19h, the patch should look like. But this might not matter
>> anymore as suggested by Guenter below.
>>
>>    /* F19h thermal registers through SMN */
>> #define F19H_M01_SVI_TEL_PLANE0                 (ZEN_SVI_BASE + 0x14)
>> #define F19H_M01_SVI_TEL_PLANE1                 (ZEN_SVI_BASE + 0x10)
>> +/* Zen3 Ryzen */
>> +#define F19H_M21H_SVI_TEL_PLANE0               (ZEN_SVI_BASE + 0x10)
>> +#define F19H_M21H_SVI_TEL_PLANE1               (ZEN_SVI_BASE + 0xc)
>>
>> Then add the following change:
>>
>>                  switch (boot_cpu_data.x86_model) {
>>                  case 0x0 ... 0x1:       /* Zen3 */
>>                          data->show_current = true;
>>                          data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
>>                          data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
>>                          data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
>>                          data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
>>                          k10temp_get_ccd_support(pdev, data, 8);
>> +               case 0x21:      /* Zen3 */
>> +                       data->show_current = true;
>> +                       data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
>> +                       data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
>> +                       data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
>> +                       data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
>> +                       k10temp_get_ccd_support(pdev, data, 8);
>>
>>>>
> 
> You are a really funny guy.
> After _all_ these are YOUR Company CPUs, and want me to fix these without docs?
> Sure I can, but the confusion started with your wrong commit message.

Sorry for the confusion. The review comments above was merely to point 
out server parts won't be supported if 0x0..0x1 is removed. I do 
appreciate the test results and bug report. The original commit 
unfortunately doesn't work on your CPUs. It was indeed a misfire from my 
side.

> 
> Besides, is that how AMD operates now?
> Let the customer pay thousands of euros for HW and then tell
> him to fix or add drivers support himself? Very interesting.
> 
> And yes it matters even after removing these.
> 
> case 0x0 ... 0x1:       /* Zen3 SP3 ( NOT YET RELEASED ) */
> case 0x21:      /* Zen3 Ryzen Desktop  */
>     ....
> 
> Right?
> 

Powered by blists - more mailing lists