[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAK_vbW2QFk8wJrK6X+Xyvefx1XDPLHOFoh0VpKnSCNN43knwMw@mail.gmail.com>
Date: Thu, 17 Mar 2022 18:30:10 -0500
From: Brandon Wyman <bjwyman@...il.com>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Joel Stanley <joel@....id.au>,
OpenBMC Maillist <openbmc@...ts.ozlabs.org>,
Eddie James <eajames@...ux.ibm.com>,
Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hwmon: (pmbus/ibm-cffps) Add clear_faults debugfs entry
On Thu, Mar 17, 2022 at 1:50 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 3/17/22 09:12, Brandon Wyman wrote:
> > On Wed, Mar 16, 2022 at 3:14 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>
> >> On 3/16/22 13:03, Brandon Wyman wrote:
> >>> On Sun, Mar 13, 2022 at 11:36 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>>>
> >>>> On 3/11/22 10:10, Brandon Wyman wrote:
> >>>>> Add a clear_faults write-only debugfs entry for the ibm-cffps device
> >>>>> driver.
> >>>>>
> >>>>> Certain IBM power supplies require clearing some latched faults in order
> >>>>> to indicate that the fault has indeed been observed/noticed.
> >>>>>
> >>>>
> >>>> That is insufficient, sorry. Please provide the affected power supplies as
> >>>> well as the affected faults, and confirm that the problem still exists
> >>>> in v5.17-rc6 or later kernels - or, more specifically, in any kernel which
> >>>> includes commit 35f165f08950 ("hwmon: (pmbus) Clear pmbus fault/warning
> >>>> bits after read").
> >>>>
> >>>> Thanks,
> >>>> Guenter
> >>>
> >>> Sorry for the delay in responding. I did some testing with commit
> >>> 35f165f08950. I could not get that code to send the CLEAR_FAULTS
> >>> command to the power supplies.
> >>>
> >>> I can update the commit message to be more specific about which power
> >>> supplies need this CLEAR_FAULTS sent, and which faults. It is observed
> >>> with the 1600W power supplies (2B1E model). The faults that latch are
> >>> the VIN_UV and INPUT faults in the STATUS_WORD. The corresponding
> >>> STATUS_INPUT fault bits are VIN_UV_FAULT and Unit is Off.
> >>>
> >>
> >> The point is that the respective fault bits should be reset when the
> >> corresponding alarm attributes are read. This isn't about executing
> >> a CLEAR_FAULTS command, but about selectively resetting fault bits
> >> while ensuring that faults are reported at least once. Executing
> >> CLEAR_FAULTS is a big hammer.
> >>
> >> With the patch I pointed to in place, input (and other) faults should
> >> be reset after the corresponding alarm attributes are read, assuming
> >> that the condition no longer exists. If that does not happen, we should
> >> fix the problem instead of deploying the big hammer.
> >>
> >> Thanks,
> >> Guenter
> >
> > Okay, I see what you are pointing out there. I had been mostly looking
> > at the "files" in the debugfs paths. Those do not end up running
> > through that pmbus_get_boolean() function, so the individual fault
> > clearing was not being attempted. The fault I was interested in
> > appears to be associated with in1_lcrti_alarm. Reading that will give
> > me a 1 if there is a VIN_UV fault, and then it sends 0x10 to
> > STATUS_INPUT. That clears out VIN_UV, but the STATUS_INPUT command was
> > returning 0x18. Nothing appears to handle clearing BIT(3), that 0x08
> > mask.
> >
> > Should there be some kind of define for BIT(3) over in pmbus.h?
> > Something like PB_VOLTAGE_OFF? Somehow we need something using that in
> > sbit of the attributes. I had a quick hack that just OR'ed BIT(3) with
> > BIT(4) for that PB_VOLTAGE_UV_FAULT. That resulted in a clear of both
> > bits in STATUS_INPUT, and the faults clearing in STATUS_WORD.
> >
> > It is not clear if there should be a separate alarm for that "Unit Off
> > For Insufficient Input Voltage", or if the one for in1_lcrit_alarm
> > could just be the two bits OR'ed into one mask. I can send a patch
> > with a proposal on how to fix this one bit not getting cleared.
> >
>
> We don't have a separate standard attribute. I think the best approach
> would be to add a mask for bit 3 and or that mask for lcrit in
> vin_limit_attrs with PB_VOLTAGE_UV_FAULT. I'd suggest to name the
> define something like PB_VOLTAGE_VIN_OFF or PB_VOLTAGE_VIN_FAULT
> to clarify that the bit applies to the input.
Done. See: https://lore.kernel.org/linux-hwmon/20220317232123.2103592-1-bjwyman@gmail.com/T/#u
>
> Thanks,
> Guenter
Powered by blists - more mailing lists