[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 12 Dec 2018 17:08:11 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Igor Russkikh <Igor.Russkikh@...antia.com>
Cc: "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Dmitry Bezrukov <Dmitry.Bezrukov@...antia.com>
Subject: Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling
feature
On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@...antia.com>
>
> Lab testing shows that chip may get overheated when
> network link is connected on high speed (2.5G/5G).
>
> Default hardware design uses only passive heatsink without a cooler,
> and that makes things worse.
>
> To prevent possible chip damage here we add throttling mechanism.
>
> There is a worker that monitors the PHY's temperature via reading
> PHY registers. When PHY's temperature reaches the critical threshold
> (it is 106 degrees of Celsius) it changes the link speed to the lowest
> regardless user/default link speed settings. It should reduce the PHY's
> temperature to normal numbers.
>
> When the PHY's temparature is back to normal (low threshold, it is
> 85 degrees) it restores user/default link speed settings.
Hi Igor
Please could you also export the temperature using HWMON. The Marvell
PHY drivers are good examples.
I also have a patch for driver/net/phy/aquantia.c which adds HWMON
support to that PHY.
>
> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@...antia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@...antia.com>
> ---
> drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
> drivers/net/usb/aqc111.h | 8 +++++
> 2 files changed, 86 insertions(+)
>
> diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
> index 8867f6a3eaa7..fa6b9dfc2a6f 100644
> --- a/drivers/net/usb/aqc111.c
> +++ b/drivers/net/usb/aqc111.c
> @@ -17,6 +17,7 @@
> #include <linux/usb/cdc.h>
> #include <linux/usb/usbnet.h>
> #include <linux/linkmode.h>
> +#include <linux/workqueue.h>
>
> #include "aqc111.h"
>
> @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
> aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
> AQ_DSH_RETRIES_MASK;
>
> + if (aqc111_data->thermal_throttling)
> + speed = SPEED_100;
> +
That is a big jump. Do you need to cool is down quickly, or would
1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
between 5G and 1G, not 5G and 100M.
> if (autoneg == AUTONEG_ENABLE) {
> switch (speed) {
> case SPEED_5000:
It looks like this only works when auto-neg is enabled. If i've fixed
configured it i don't think this works?
> @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
> /* store aqc111_data pointer in device data field */
> dev->driver_priv = aqc111_data;
>
> + aqc111_data->dev = dev;
> +
> /* Init the MAC address */
> ret = aqc111_read_perm_mac(dev);
> if (ret)
> @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
> return 0;
> }
>
> +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
> +{
> + u16 reg16 = 0;
> +
> + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
> + ®16) < 0)
> + goto err;
> +
> + if (!(reg16 & AQ_THERM_ST_READY))
> + goto err;
> +
> + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
> + ®16) < 0)
> + goto err;
> +
> + /*convert from 1/256 to 1/100 degrees of Celsius*/
> + *temperature = (u32)reg16 * 100 / 256;
> + return 0;
> +
> +err:
> + *temperature = 0;
> + return -1;
> +}
> +
> +void aqc111_thermal_work_cb(struct work_struct *w)
> +{
> + struct delayed_work *dw = to_delayed_work(w);
> + struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
> + therm_work);
> + unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
> + struct usbnet *dev = aqc111_data->dev;
> + u32 temperature = 0;
> + u8 reset_speed = 0;
> +
> + if (!aqc111_data->link)
> + /* poll not so frequently */
> + timeout *= 2;
> +
> + if (aqc111_get_temperature(dev, &temperature) != 0)
> + goto end;
> +
> + if (aqc111_data->thermal_throttling &&
> + temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
> + netdev_info(dev->net, "The temperature is back to normal(%u)",
> + AQ_NORMAL_TEMP_THRESHOLD / 100);
How often do you see these messages? I'm wondering if they need to be
rate limited?
> + aqc111_data->thermal_throttling = 0;
> + reset_speed = 1;
> + }
> +
> + if (!aqc111_data->thermal_throttling &&
> + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
Should there be some hysteresis in here? In fact, if temperature is
AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
the same time!
> + netdev_warn(dev->net, "Critical temperature(%u) is reached",
> + AQ_CRITICAL_TEMP_THRESHOLD / 100);
> + aqc111_data->thermal_throttling = 1;
> + reset_speed = 1;
update_speed might be a better name, since you are not always
resetting it.
Andrew
Powered by blists - more mailing lists