[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180626183244.GB4307@roeck-us.net>
Date: Tue, 26 Jun 2018 11:32:44 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Vadim Pasternak <vadimp@...lanox.com>
Cc: Andrew Lunn <andrew@...n.ch>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"rui.zhang@...el.com" <rui.zhang@...el.com>,
"edubezval@...il.com" <edubezval@...il.com>,
"jiri@...nulli.us" <jiri@...nulli.us>, mlxsw <mlxsw@...lanox.com>,
Michael Shych <michaelsh@...lanox.com>
Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface
with FAN fault attribute
On Tue, Jun 26, 2018 at 04:47:05PM +0000, Vadim Pasternak wrote:
>
>
> > -----Original Message-----
> > From: Guenter Roeck [mailto:linux@...ck-us.net]
> > Sent: Tuesday, June 26, 2018 7:33 PM
> > To: Vadim Pasternak <vadimp@...lanox.com>
> > Cc: Andrew Lunn <andrew@...n.ch>; davem@...emloft.net;
> > netdev@...r.kernel.org; rui.zhang@...el.com; edubezval@...il.com;
> > jiri@...nulli.us; mlxsw <mlxsw@...lanox.com>; Michael Shych
> > <michaelsh@...lanox.com>
> > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface
> > with FAN fault attribute
> >
> > On Tue, Jun 26, 2018 at 02:47:01PM +0000, Vadim Pasternak wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Lunn [mailto:andrew@...n.ch]
> > > > Sent: Tuesday, June 26, 2018 5:29 PM
> > > > To: Vadim Pasternak <vadimp@...lanox.com>
> > > > Cc: davem@...emloft.net; netdev@...r.kernel.org; linux@...ck-us.net;
> > > > rui.zhang@...el.com; edubezval@...il.com; jiri@...nulli.us; mlxsw
> > > > <mlxsw@...lanox.com>; Michael Shych <michaelsh@...lanox.com>
> > > > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon
> > > > interface with FAN fault attribute
> > > >
> > > > > +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
> > > > > + struct device_attribute *attr,
> > > > > + char *buf)
> > > > > +{
> > > > > + struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
> > > > > + container_of(attr, struct mlxsw_hwmon_attr,
> > > > dev_attr);
> > > > > + struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
> > > > > + char mfsm_pl[MLXSW_REG_MFSM_LEN];
> > > > > + u16 tach;
> > > > > + int err;
> > > > > +
> > > > > + mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
> > > > > + err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm),
> > > > mfsm_pl);
> > > > > + if (err) {
> > > > > + dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query
> > > > fan\n");
> > > > > + return err;
> > > > > + }
> > > > > + tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
> > > > > +
> > > > > + return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 :
> > > > > +0); }
> > > >
> > > > Documentation/hwmon/sysfs-interface says:
> > > >
> > > > Alarms are direct indications read from the chips. The drivers do
> > > > NOT make comparisons of readings to thresholds. This allows
> > > > violations between readings to be caught and alarmed. The exact
> > > > definition of an alarm (for example, whether a threshold must be met
> > > > or must be exceeded to cause an alarm) is chip-dependent.
> > > >
> > > > Now, this is a fault, not an alarm. But does the same apply?
> > >
> > Yes, it does. There are no "soft" alarms / faults.
> >
> > > Hi Andrew,
> > >
> > > Hardware provides minimum value for tachometer.
> > > Tachometer is considered as faulty in case it's below this value.
> >
> > This is for user space to decide, not for the kernel.
>
> Hi Guenter,
>
> Do you suggest to expose provide fan{x}_min, instead of fan{x}_fault
> and give to user to compare fan{x}_input versus fan{x}_min for the
> fault decision?
>
fanX_min only makes sense if programmed into or reported by the chip
or controller (that is what the attribute is for), usually to enable
the chip/controller to set an alarm. If the chip or controller does
not have a minimum speed register, the attribute should not exist,
and any decision based on a comparison between a minimum fan speed
and the actual fan speed is a user space problem.
I don't know what the tach_min calculation is about, but setting
it to the minimum of all tachometer speeds (or of all reported
minimums ?) is not the task of a hwmon driver. A hwmon driver
reports what it gets from hardware; the interpretation is up
to other parts of the system (eg userspace or the thermal
subsystem). That includes a software-based decision if an alarm
or fault should be reported or not.
> >
> > > In case any tachometer is faulty, PWM according to the system
> > > requirements should be set to 100% until the fault
> >
> > system requirements. Again, this is for user space to decide.
>
>
> Yes, user should decide in this case and I wanted to provide to user
> fan{x}_fault for this matter. But it could do it based on input and min
> attributes, of course.
>
Note that "fault" and "alarm" do have distinct different meanings.
Many fan controllers can detect if a fan is faulty (eg no sensor
connected or it is deemed faulty) or if it just runs too slow.
The typical remedy is also different: A slow fan may just need
more pwm or voltage, a faulty fan needs to be replaced.
Guenter
> >
> > > is not recovered (f.e. by physical replacing of bad unit).
> > > This is the motivation to expose fan{x}_fault in the way it's exposed.
> > >
> > > Thanks,
> > > Vadim.
> > >
> > > >
> > > > Andrew
Powered by blists - more mailing lists