[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK_vbW1Lfroo91cMxsLpuf-uuDwcsssG1=fjp3an_O5-FUHjMQ@mail.gmail.com>
Date: Thu, 17 Mar 2022 11:12:49 -0500
From: Brandon Wyman <bjwyman@...il.com>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Joel Stanley <joel@....id.au>,
OpenBMC Maillist <openbmc@...ts.ozlabs.org>,
Eddie James <eajames@...ux.ibm.com>,
Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hwmon: (pmbus/ibm-cffps) Add clear_faults debugfs entry
On Wed, Mar 16, 2022 at 3:14 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 3/16/22 13:03, Brandon Wyman wrote:
> > On Sun, Mar 13, 2022 at 11:36 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>
> >> On 3/11/22 10:10, Brandon Wyman wrote:
> >>> Add a clear_faults write-only debugfs entry for the ibm-cffps device
> >>> driver.
> >>>
> >>> Certain IBM power supplies require clearing some latched faults in order
> >>> to indicate that the fault has indeed been observed/noticed.
> >>>
> >>
> >> That is insufficient, sorry. Please provide the affected power supplies as
> >> well as the affected faults, and confirm that the problem still exists
> >> in v5.17-rc6 or later kernels - or, more specifically, in any kernel which
> >> includes commit 35f165f08950 ("hwmon: (pmbus) Clear pmbus fault/warning
> >> bits after read").
> >>
> >> Thanks,
> >> Guenter
> >
> > Sorry for the delay in responding. I did some testing with commit
> > 35f165f08950. I could not get that code to send the CLEAR_FAULTS
> > command to the power supplies.
> >
> > I can update the commit message to be more specific about which power
> > supplies need this CLEAR_FAULTS sent, and which faults. It is observed
> > with the 1600W power supplies (2B1E model). The faults that latch are
> > the VIN_UV and INPUT faults in the STATUS_WORD. The corresponding
> > STATUS_INPUT fault bits are VIN_UV_FAULT and Unit is Off.
> >
>
> The point is that the respective fault bits should be reset when the
> corresponding alarm attributes are read. This isn't about executing
> a CLEAR_FAULTS command, but about selectively resetting fault bits
> while ensuring that faults are reported at least once. Executing
> CLEAR_FAULTS is a big hammer.
>
> With the patch I pointed to in place, input (and other) faults should
> be reset after the corresponding alarm attributes are read, assuming
> that the condition no longer exists. If that does not happen, we should
> fix the problem instead of deploying the big hammer.
>
> Thanks,
> Guenter
Okay, I see what you are pointing out there. I had been mostly looking
at the "files" in the debugfs paths. Those do not end up running
through that pmbus_get_boolean() function, so the individual fault
clearing was not being attempted. The fault I was interested in
appears to be associated with in1_lcrti_alarm. Reading that will give
me a 1 if there is a VIN_UV fault, and then it sends 0x10 to
STATUS_INPUT. That clears out VIN_UV, but the STATUS_INPUT command was
returning 0x18. Nothing appears to handle clearing BIT(3), that 0x08
mask.
Should there be some kind of define for BIT(3) over in pmbus.h?
Something like PB_VOLTAGE_OFF? Somehow we need something using that in
sbit of the attributes. I had a quick hack that just OR'ed BIT(3) with
BIT(4) for that PB_VOLTAGE_UV_FAULT. That resulted in a clear of both
bits in STATUS_INPUT, and the faults clearing in STATUS_WORD.
It is not clear if there should be a separate alarm for that "Unit Off
For Insufficient Input Voltage", or if the one for in1_lcrit_alarm
could just be the two bits OR'ed into one mask. I can send a patch
with a proposal on how to fix this one bit not getting cleared.
>
> >>
> >>> Signed-off-by: Brandon Wyman <bjwyman@...il.com>
> >>> ---
> >>> V1 -> V2: Explain why this change is needed
> >>>
> >>> drivers/hwmon/pmbus/ibm-cffps.c | 11 +++++++++++
> >>> 1 file changed, 11 insertions(+)
> >>>
> >>> diff --git a/drivers/hwmon/pmbus/ibm-cffps.c b/drivers/hwmon/pmbus/ibm-cffps.c
> >>> index e3294a1a54bb..3f02dde02a4b 100644
> >>> --- a/drivers/hwmon/pmbus/ibm-cffps.c
> >>> +++ b/drivers/hwmon/pmbus/ibm-cffps.c
> >>> @@ -67,6 +67,7 @@ enum {
> >>> CFFPS_DEBUGFS_CCIN,
> >>> CFFPS_DEBUGFS_FW,
> >>> CFFPS_DEBUGFS_ON_OFF_CONFIG,
> >>> + CFFPS_DEBUGFS_CLEAR_FAULTS,
> >>> CFFPS_DEBUGFS_NUM_ENTRIES
> >>> };
> >>>
> >>> @@ -274,6 +275,13 @@ static ssize_t ibm_cffps_debugfs_write(struct file *file,
> >>> if (rc)
> >>> return rc;
> >>>
> >>> + rc = 1;
> >>> + break;
> >>> + case CFFPS_DEBUGFS_CLEAR_FAULTS:
> >>> + rc = i2c_smbus_write_byte(psu->client, PMBUS_CLEAR_FAULTS);
> >>> + if (rc < 0)
> >>> + return rc;
> >>> +
> >>> rc = 1;
> >>> break;
> >>> default:
> >>> @@ -607,6 +615,9 @@ static int ibm_cffps_probe(struct i2c_client *client)
> >>> debugfs_create_file("on_off_config", 0644, ibm_cffps_dir,
> >>> &psu->debugfs_entries[CFFPS_DEBUGFS_ON_OFF_CONFIG],
> >>> &ibm_cffps_fops);
> >>> + debugfs_create_file("clear_faults", 0200, ibm_cffps_dir,
> >>> + &psu->debugfs_entries[CFFPS_DEBUGFS_CLEAR_FAULTS],
> >>> + &ibm_cffps_fops);
> >>>
> >>> return 0;
> >>> }
> >>
>
Powered by blists - more mailing lists