lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DB8PR04MB6795FCF1B1E5E0B03CA49B90E6959@DB8PR04MB6795.eurprd04.prod.outlook.com>
Date:   Fri, 12 Nov 2021 01:21:57 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Vladimir Oltean <olteanv@...il.com>
CC:     "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        dl-linux-imx <linux-imx@....com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH net] net: stmmac: fix system hang caused by eee_ctrl_timer
 during suspend/resume


Hi Vladimir,

> -----Original Message-----
> From: Vladimir Oltean <olteanv@...il.com>
> Sent: 2021年11月11日 23:16
> To: Joakim Zhang <qiangqing.zhang@....com>
> Cc: peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> joabreu@...opsys.com; davem@...emloft.net; kuba@...nel.org;
> dl-linux-imx <linux-imx@....com>; netdev@...r.kernel.org;
> linux-kernel@...r.kernel.org
> Subject: Re: [PATCH net] net: stmmac: fix system hang caused by
> eee_ctrl_timer during suspend/resume
> 
> Hi Joakim,
> 
> On Wed, Sep 08, 2021 at 03:43:35PM +0800, Joakim Zhang wrote:
> > commit 5f58591323bf ("net: stmmac: delete the eee_ctrl_timer after
> > napi disabled"), this patch tries to fix system hang caused by
> > eee_ctrl_timer, unfortunately, it only can resolve it for system
> > reboot stress test. System hang also can be reproduced easily during
> > system suspend/resume stess test when mount NFS on i.MX8MP EVK
> board.
> >
> > In stmmac driver, eee feature is combined to phylink framework. When
> > do system suspend, phylink_stop() would queue delayed work, it invokes
> > stmmac_mac_link_down(), where to deactivate eee_ctrl_timer
> synchronizly.
> > In above commit, try to fix issue by deactivating eee_ctrl_timer
> > obviously, but it is not enough. Looking into eee_ctrl_timer expire
> > callback stmmac_eee_ctrl_timer(), it could enable hareware eee mode
> > again. What is unexpected is that LPI interrupt
> > (MAC_Interrupt_Enable.LPIEN bit) is always asserted. This interrupt
> > has chance to be issued when LPI state entry/exit from the MAC, and at
> that time, clock could have been already disabled.
> > The result is that system hang when driver try to touch register from
> > interrupt handler.
> >
> > The reason why above commit can fix system hang issue in
> > stmmac_release() is that, deactivate eee_ctrl_timer not just after
> > napi disabled, further after irq freed.
> >
> > In conclusion, hardware would generate LPI interrupt when clock has
> > been disabled during suspend or resume, since hardware is in eee mode
> > and LPI interrupt enabled.
> >
> > Interrupts from MAC, MTL and DMA level are enabled and never been
> > disabled when system suspend, so postpone clocks management from
> > suspend stage to noirq suspend stage should be more safe.
> >
> > Fixes: 5f58591323bf ("net: stmmac: delete the eee_ctrl_timer after
> > napi disabled")
> > Signed-off-by: Joakim Zhang <qiangqing.zhang@....com>
> > ---
> >  .../net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ------
> > .../ethernet/stmicro/stmmac/stmmac_platform.c | 44
> +++++++++++++++++++
> >  2 files changed, 44 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index ece02b35a6ce..246f84fedbc8 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -7118,7 +7118,6 @@ int stmmac_suspend(struct device *dev)
> >  	struct net_device *ndev = dev_get_drvdata(dev);
> >  	struct stmmac_priv *priv = netdev_priv(ndev);
> >  	u32 chan;
> > -	int ret;
> >
> >  	if (!ndev || !netif_running(ndev))
> >  		return 0;
> > @@ -7150,13 +7149,6 @@ int stmmac_suspend(struct device *dev)
> >  	} else {
> >  		stmmac_mac_set(priv, priv->ioaddr, false);
> >  		pinctrl_pm_select_sleep_state(priv->device);
> > -		/* Disable clock in case of PWM is off */
> > -		clk_disable_unprepare(priv->plat->clk_ptp_ref);
> 
> This patch looks strange to me. You are basically saying that the LPI timer for
> MAC-level EEE is clocked from the clk_ptp_ref clock?! Are you sure this is
> correct? I thought this clock is only used for the PTP timestamping counter.
> Maybe the clock definitions in imx8mp.dtsi are not correct?

No, MAC-level EEE is not clocked from the clk_ptp_ref clock, this clock is for PTP. To fix this issue,
I can only move pm_runtime_force_suspend() into noirq suspend stage. As commit message said, 
"postpone clocks management from suspend stage to noirq suspend stage", that means I move all
the clocks management into noirq suspend stage, including PTP clock, it's should be harmless for PTP
function. Do you find any regression with this patch? 

Best Regards,
Joakim Zhang
> > -		ret = pm_runtime_force_suspend(dev);
> > -		if (ret) {
> > -			mutex_unlock(&priv->lock);
> > -			return ret;
> > -		}
> >  	}
> >
> >  	mutex_unlock(&priv->lock);
> > @@ -7242,12 +7234,6 @@ int stmmac_resume(struct device *dev)
> >  		priv->irq_wake = 0;
> >  	} else {
> >  		pinctrl_pm_select_default_state(priv->device);
> > -		/* enable the clk previously disabled */
> > -		ret = pm_runtime_force_resume(dev);
> > -		if (ret)
> > -			return ret;
> > -		if (priv->plat->clk_ptp_ref)
> > -			clk_prepare_enable(priv->plat->clk_ptp_ref);
> >  		/* reset the phy so that it's ready */
> >  		if (priv->mii)
> >  			stmmac_mdio_reset(priv->mii);
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > index 5ca710844cc1..4885f9ad1b1e 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> > @@ -9,6 +9,7 @@
> >
> >
> **********************************************************
> ************
> > *********/
> >
> >  #include <linux/platform_device.h>
> > +#include <linux/pm_runtime.h>
> >  #include <linux/module.h>
> >  #include <linux/io.h>
> >  #include <linux/of.h>
> > @@ -771,9 +772,52 @@ static int __maybe_unused
> stmmac_runtime_resume(struct device *dev)
> >  	return stmmac_bus_clks_config(priv, true);  }
> >
> > +static int stmmac_pltfr_noirq_suspend(struct device *dev) {
> > +	struct net_device *ndev = dev_get_drvdata(dev);
> > +	struct stmmac_priv *priv = netdev_priv(ndev);
> > +	int ret;
> > +
> > +	if (!netif_running(ndev))
> > +		return 0;
> > +
> > +	if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> > +		/* Disable clock in case of PWM is off */
> > +		clk_disable_unprepare(priv->plat->clk_ptp_ref);
> > +
> > +		ret = pm_runtime_force_suspend(dev);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int stmmac_pltfr_noirq_resume(struct device *dev) {
> > +	struct net_device *ndev = dev_get_drvdata(dev);
> > +	struct stmmac_priv *priv = netdev_priv(ndev);
> > +	int ret;
> > +
> > +	if (!netif_running(ndev))
> > +		return 0;
> > +
> > +	if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> > +		/* enable the clk previously disabled */
> > +		ret = pm_runtime_force_resume(dev);
> > +		if (ret)
> > +			return ret;
> > +
> > +		clk_prepare_enable(priv->plat->clk_ptp_ref);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  const struct dev_pm_ops stmmac_pltfr_pm_ops = {
> >  	SET_SYSTEM_SLEEP_PM_OPS(stmmac_pltfr_suspend,
> stmmac_pltfr_resume)
> >  	SET_RUNTIME_PM_OPS(stmmac_runtime_suspend,
> stmmac_runtime_resume,
> > NULL)
> > +	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(stmmac_pltfr_noirq_suspend,
> > +stmmac_pltfr_noirq_resume)
> >  };
> >  EXPORT_SYMBOL_GPL(stmmac_pltfr_pm_ops);
> >
> > --
> > 2.17.1
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ