lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4dec1984-b406-7b54-7f8e-7ac93d4eb6f0@linux.intel.com>
Date:   Tue, 22 Feb 2022 18:50:51 +0530
From:   Vikash Chandola <vikash.chandola@...ux.intel.com>
To:     Guenter Roeck <linux@...ck-us.net>, iwona.winiarska@...el.com,
        jdelvare@...e.com, linux-hwmon@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hwmon: (pmbus) Clear pmbus fault/warning bits before read



On 2/11/2022 1:27 AM, Guenter Roeck wrote:
> On 2/10/22 11:55, Guenter Roeck wrote:
>> On 2/10/22 10:29, Vikash Chandola wrote:
>>>
>>>
>>> On 2/10/2022 10:55 PM, Guenter Roeck wrote:
>>>> On 2/10/22 07:57, Vikash Chandola wrote:
>>>>>
>>>>>
>>>>> On 2/10/2022 8:14 PM, Guenter Roeck wrote:
>>>>>> On 2/10/22 04:41, Vikash Chandola wrote:
>>>>>>> pmbus fault and warning bits are not cleared by itself once 
>>>>>>> fault/warning
>>>>>>> condition is not valid anymore. As per pmbus datasheet faults 
>>>>>>> must be
>>>>>>> cleared by user.
>>>>>>> Modify hwmon behavior to clear latched status bytes if any bit in 
>>>>>>> status
>>>>>>> register is high prior to returning fresh data to userspace. If
>>>>>>> fault/warning conditions are still applicable fault/warning bits 
>>>>>>> will be
>>>>>>> set and we will get updated data in second read.
>>>>>>>
>>>>>>> Hwmon behavior is changed here. Now sysfs reads will reflect latest
>>>>>>> values from pmbus slave, not latched values.
>>>>>>> In case a transient warning/fault has happened in the past, it 
>>>>>>> will no
>>>>>>> longer be reported to userspace.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> NACK.
>>>>>>
>>>>>> Reporting that information is exactly the point of the current code.
>>>>>> We _do_ want to report at least once that a problem occurred in 
>>>>>> the past,
>>>>>> and only clear the warning flag(s) afterwards.
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>> But as per current implementation we will continue to report fault 
>>>>> even after fault condition is cleared. I could not find sysfs entry 
>>>>> or any other means by which user/kernel can clear the 
>>>>> faults/warnings after it is reported. Please point if I am missing 
>>>>> anything.
>>>>>
>>>>
>>>> Normally a chip should clear the status after the fault condition 
>>>> cleared
>>>> and the register was read. At least that was my experience so far.
>>>> What chip (or chips) don't do that ?
>>>>
>>>> Guenter
>>>
>>> I see that pmbus spec says that once fault/warning bits are set
>>> they won't be cleared by itself. Section 10.2.3 of
>>> "PMBus Specification Rev. 1.3.1 Part II 2015-03-13" from
>>> https://pmbus.org/specification-archives/   says
>>>
>>> "
>>> Almost all of the warning or fault bits set in the status registers 
>>> remain set, even if the fault or warning condition is removed
>>> or corrected, until one of the following occur:
>>> * The bit is individually cleared (Section 10.2.4),
>>> * The device receives a CLEAR_FAULTS command (Section 15.1),
>>> * A RESET signal (if one exists) is asserted,
>>> * The output is commanded through the CONTROL pin, the OPERATION 
>>> command, or the combined action of the CONTROL pin and OPERATION
>>> command, to turn off and then to turn back on, or
>>> * Bias power is removed from the PMBus device...
>>> "
>>>
>>>  From this I concluded that slave(PSU) following pmbus spec will not
>>> clear the fault/warning in status registers.
>>> I don't have exact chip details handy but I can get it by tomorrow.
>>> However this looks to be issue not related to specific chip ?
>>>
>>
>> Good point. Interesting that I have not seen this before.
>>
>>> If this is a problem, how should I approach this ? Shall I create a new
>>> sysfs entry that user space application can invoke and clear all faults
>>> or provide sysfs entry to clear individual fault/warning bits or
>>> something else ?
>>>
>>
>> No. What we need to do is to add code to pmbus_get_boolean() and 
>> selectively
>> write the reported status bit back into the status register. Something 
>> like
>>      ...
>>      regval = status & mask;
>>      if (regval) {
>>          ret = pmbus_write_status(client, page, reg, regval);
>>                        ^^^^^^^^^^^^^^^^^^ new function
>>          if (ret)
>>              goto unlock;
>>      }
>>
>> and that will need to be tested on a variety of real hardware.
Hi Guenter,
I see that there is already a method pmbus_write_byte_data that does
same thing. I will post patch using that method to update one bit.
>>
> 
> Plus, of course, we need to confirm that there is at least one chip which
> does follow the specification and does not auto-clear alarms.
We observed above behavior SOLUM G36234-015 PSU.
Apologies for delay. It took some time to get information from our PSU
experts that PSU firmware itself is not broken here.
> 
> Guenter

-- 
Vikash

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ