lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 Nov 2021 17:16:12 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Joakim Zhang <qiangqing.zhang@....com>
Cc:     peppe.cavallaro@...com, alexandre.torgue@...s.st.com,
        joabreu@...opsys.com, davem@...emloft.net, kuba@...nel.org,
        linux-imx@....com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: stmmac: fix system hang caused by
 eee_ctrl_timer during suspend/resume

Hi Joakim,

On Wed, Sep 08, 2021 at 03:43:35PM +0800, Joakim Zhang wrote:
> commit 5f58591323bf ("net: stmmac: delete the eee_ctrl_timer after
> napi disabled"), this patch tries to fix system hang caused by eee_ctrl_timer,
> unfortunately, it only can resolve it for system reboot stress test. System
> hang also can be reproduced easily during system suspend/resume stess test
> when mount NFS on i.MX8MP EVK board.
> 
> In stmmac driver, eee feature is combined to phylink framework. When do
> system suspend, phylink_stop() would queue delayed work, it invokes
> stmmac_mac_link_down(), where to deactivate eee_ctrl_timer synchronizly.
> In above commit, try to fix issue by deactivating eee_ctrl_timer obviously,
> but it is not enough. Looking into eee_ctrl_timer expire callback
> stmmac_eee_ctrl_timer(), it could enable hareware eee mode again. What is
> unexpected is that LPI interrupt (MAC_Interrupt_Enable.LPIEN bit) is always
> asserted. This interrupt has chance to be issued when LPI state entry/exit
> from the MAC, and at that time, clock could have been already disabled.
> The result is that system hang when driver try to touch register from
> interrupt handler.
> 
> The reason why above commit can fix system hang issue in stmmac_release()
> is that, deactivate eee_ctrl_timer not just after napi disabled, further
> after irq freed.
> 
> In conclusion, hardware would generate LPI interrupt when clock has been
> disabled during suspend or resume, since hardware is in eee mode and LPI
> interrupt enabled.
> 
> Interrupts from MAC, MTL and DMA level are enabled and never been disabled
> when system suspend, so postpone clocks management from suspend stage to
> noirq suspend stage should be more safe.
> 
> Fixes: 5f58591323bf ("net: stmmac: delete the eee_ctrl_timer after napi disabled")
> Signed-off-by: Joakim Zhang <qiangqing.zhang@....com>
> ---
>  .../net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ------
>  .../ethernet/stmicro/stmmac/stmmac_platform.c | 44 +++++++++++++++++++
>  2 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index ece02b35a6ce..246f84fedbc8 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -7118,7 +7118,6 @@ int stmmac_suspend(struct device *dev)
>  	struct net_device *ndev = dev_get_drvdata(dev);
>  	struct stmmac_priv *priv = netdev_priv(ndev);
>  	u32 chan;
> -	int ret;
>  
>  	if (!ndev || !netif_running(ndev))
>  		return 0;
> @@ -7150,13 +7149,6 @@ int stmmac_suspend(struct device *dev)
>  	} else {
>  		stmmac_mac_set(priv, priv->ioaddr, false);
>  		pinctrl_pm_select_sleep_state(priv->device);
> -		/* Disable clock in case of PWM is off */
> -		clk_disable_unprepare(priv->plat->clk_ptp_ref);

This patch looks strange to me. You are basically saying that the LPI
timer for MAC-level EEE is clocked from the clk_ptp_ref clock?! Are you
sure this is correct? I thought this clock is only used for the PTP
timestamping counter. Maybe the clock definitions in imx8mp.dtsi are not
correct?

> -		ret = pm_runtime_force_suspend(dev);
> -		if (ret) {
> -			mutex_unlock(&priv->lock);
> -			return ret;
> -		}
>  	}
>  
>  	mutex_unlock(&priv->lock);
> @@ -7242,12 +7234,6 @@ int stmmac_resume(struct device *dev)
>  		priv->irq_wake = 0;
>  	} else {
>  		pinctrl_pm_select_default_state(priv->device);
> -		/* enable the clk previously disabled */
> -		ret = pm_runtime_force_resume(dev);
> -		if (ret)
> -			return ret;
> -		if (priv->plat->clk_ptp_ref)
> -			clk_prepare_enable(priv->plat->clk_ptp_ref);
>  		/* reset the phy so that it's ready */
>  		if (priv->mii)
>  			stmmac_mdio_reset(priv->mii);
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index 5ca710844cc1..4885f9ad1b1e 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -9,6 +9,7 @@
>  *******************************************************************************/
>  
>  #include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/module.h>
>  #include <linux/io.h>
>  #include <linux/of.h>
> @@ -771,9 +772,52 @@ static int __maybe_unused stmmac_runtime_resume(struct device *dev)
>  	return stmmac_bus_clks_config(priv, true);
>  }
>  
> +static int stmmac_pltfr_noirq_suspend(struct device *dev)
> +{
> +	struct net_device *ndev = dev_get_drvdata(dev);
> +	struct stmmac_priv *priv = netdev_priv(ndev);
> +	int ret;
> +
> +	if (!netif_running(ndev))
> +		return 0;
> +
> +	if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> +		/* Disable clock in case of PWM is off */
> +		clk_disable_unprepare(priv->plat->clk_ptp_ref);
> +
> +		ret = pm_runtime_force_suspend(dev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int stmmac_pltfr_noirq_resume(struct device *dev)
> +{
> +	struct net_device *ndev = dev_get_drvdata(dev);
> +	struct stmmac_priv *priv = netdev_priv(ndev);
> +	int ret;
> +
> +	if (!netif_running(ndev))
> +		return 0;
> +
> +	if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> +		/* enable the clk previously disabled */
> +		ret = pm_runtime_force_resume(dev);
> +		if (ret)
> +			return ret;
> +
> +		clk_prepare_enable(priv->plat->clk_ptp_ref);
> +	}
> +
> +	return 0;
> +}
> +
>  const struct dev_pm_ops stmmac_pltfr_pm_ops = {
>  	SET_SYSTEM_SLEEP_PM_OPS(stmmac_pltfr_suspend, stmmac_pltfr_resume)
>  	SET_RUNTIME_PM_OPS(stmmac_runtime_suspend, stmmac_runtime_resume, NULL)
> +	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(stmmac_pltfr_noirq_suspend, stmmac_pltfr_noirq_resume)
>  };
>  EXPORT_SYMBOL_GPL(stmmac_pltfr_pm_ops);
>  
> -- 
> 2.17.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ