[<prev] [next>] [day] [month] [year] [list]
Message-ID: <DB8PR04MB67950F82563DF00432D4E56AE6D10@DB8PR04MB6795.eurprd04.prod.outlook.com>
Date: Tue, 5 Jan 2021 13:43:39 +0000
From: Joakim Zhang <qiangqing.zhang@....com>
To: "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
"alexandre.torgue@...com" <alexandre.torgue@...com>,
"joabreu@...opsys.com" <joabreu@...opsys.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>
Subject: suspend/resume issue in stmmac driver
Hi guys,
When I do suspend/resume stress test with stmmac driver, I encountered some tricky issues. DWC EQOS version is 5.10, Linux kernel version is 5.10.
1. The first issue is net watchdog timeout.
stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify a timer to do the transmission cleanup work. Imagine such a situation, stmmac enters suspend immediately after stmmac_xmit() modify tx timer,
stmmac_tx_clean() would not be invoked, this could affect BQL(I still don't know the specific reason), since netdev_tx_completed_queue() have not been involved, and then dql_avail(&dev_queue->dql) finally always return a negative value.
__dev_xmit_skb() -> qdisc_run() -> __qdisc_run() -> qdisc_restart() -> dequeue_skb():
if ((q->flags & TCQ_F_ONETXQUEUE) &&
netif_xmit_frozen_or_stopped(txq)) // __QUEUE_STATE_STACK_XOFF bit is set
After checking this, net core will stop transmitting any more. As a result, net watchdong would timeout. To fix this issue, we should call netdev_tx_reset_queue() in stmmac_resume().
2. The second issue is Rx channel fatal bus error.
During suspend/resume test, Rx channel report fatal bus error at a high possibility(and report many times), but there is no handler for this situation in stmmac driver. Do you know what would cause Rx channel fatal error? And how to handle it?
I did some work, but now still can't fix it.
Thanks a lot in advance. 😊
Best Regards,
Joakim Zhang
Powered by blists - more mailing lists