lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <DB8PR04MB67950F82563DF00432D4E56AE6D10@DB8PR04MB6795.eurprd04.prod.outlook.com>
Date:   Tue, 5 Jan 2021 13:43:39 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...com" <alexandre.torgue@...com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>
Subject: suspend/resume issue in stmmac driver


Hi guys,

When I do suspend/resume stress test with stmmac driver, I encountered some tricky issues. DWC EQOS version is 5.10, Linux kernel version is 5.10.

1. The first issue is net watchdog timeout.
stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify a timer to do the transmission cleanup work. Imagine such a situation, stmmac enters suspend immediately after stmmac_xmit() modify tx timer,
stmmac_tx_clean() would not be invoked, this could affect BQL(I still don't know the specific reason), since netdev_tx_completed_queue() have not been involved, and then dql_avail(&dev_queue->dql) finally always return a negative value.
	__dev_xmit_skb() -> qdisc_run() -> __qdisc_run() -> qdisc_restart() -> dequeue_skb():
         if ((q->flags & TCQ_F_ONETXQUEUE) &&
             netif_xmit_frozen_or_stopped(txq))  // __QUEUE_STATE_STACK_XOFF bit is set
After checking this, net core will stop transmitting any more. As a result, net watchdong would timeout. To fix this issue, we should call netdev_tx_reset_queue() in stmmac_resume().

2. The second issue is Rx channel fatal bus error.
During suspend/resume test, Rx channel report fatal bus error at a high possibility(and report many times), but there is no handler for this situation in stmmac driver. Do you know what would cause Rx channel fatal error? And how to handle it?
I did some work, but now still can't fix it.

Thanks a lot in advance. 😊

Best Regards,
Joakim Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ