[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180626170012.GA28370@roeck-us.net>
Date: Tue, 26 Jun 2018 10:00:12 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Andrew Lunn <andrew@...n.ch>
Cc: Vadim Pasternak <vadimp@...lanox.com>, linux-pm@...r.kernel.org,
netdev@...r.kernel.org, rui.zhang@...el.com, edubezval@...il.com,
jiri@...nulli.us
Subject: Re: [patch net-next RFC 03/12] mlxsw: core: Add core environment
module for port temperature reading
On Tue, Jun 26, 2018 at 04:22:38PM +0200, Andrew Lunn wrote:
> On Tue, Jun 26, 2018 at 12:10:28PM +0000, Vadim Pasternak wrote:
>
> Adding the linux-pm@...r.kernel.org list.
>
> > Add new core_env module to allow port temperature reading. This
> > information has most critical impact on system's thermal monitoring and
> > is to be used by core_hwmon and core_thermal modules.
> >
> > New internal API reads the temperature from all the modules, which are
> > equipped with the thermal sensor and exposes temperature according to
> > the worst measure. All individual temperature values are normalized to
> > pre-defined range.
>
> This patchset has been sent to the netdev list before. I raised a few
> questions about this, which is why it is now being posted to a bigger
> group for review.
>
> The hardware has up to 64 temperature sensors. These sensors are
> hot-plugable, since they are inside SFP modules, which are
> hot-plugable. Different SFP modules can have different operating
> temperature ranges. They contain an EEPROM which lists upper and lower
> warning and fail temperatures, and report alarms when these thresholds
> a reached.
>
> This code takes the 64 sensors readings and calculates a single value
> it passes to one thermal zone. That thermal zone then controls one fan
> to keep this single value in range.
>
> I queried is this is the correct way to do this? Would it not be
> better to have up to 64 thermal zones? Leave the thermal core to
> iterate over all the zones in order to determine how the fan should be
> driven?
>
I very much think so. This problem must exist elsewhere; essentially
it is the bundling of multiple temperature sensors into a single thermal
zone. I am not sure if this should be 64 thermal zones or one thermal
zone with up to 64 sensors and some algorithm to select the relevant
temperature; that would be up to the thermal subsystem maintainers
to decide. Either case, the sensors should be handled and reported
as individual sensors, with appropriate limits, not as single sensor.
Yes, I understand that means we'll have hundreds of hwmon devices,
but that should not be a problem (and if it is, we'll have to fix
the problem, not the code exposing it).
I understand that the thermal subsystem does not currently support
handling this problem. There may also be some missing pieces between
the hwmon and thermal subsystems, such as reporting limits or alarms
when a hwmon driver register with the thermal subsystem.
Maybe it is time to add this support as part of this patch series ?
> This is possibly the first board with so many sensors. However, i
> doubt it is totally unique. Other big Ethernet switches with lots of
> SFP modules may be added later. Also, 10G copper PHYs often have
> temperature sensors, so this is not limited to just boards with
> optical ports. So having a generic solution would be good.
Agreed.
Thanks,
Guenter
>
> What do the Linux PM exports say about this?
>
> Thanks
> Andrew
Powered by blists - more mailing lists