lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20231102172503.3413318-1-alexey.pakhunov@spacex.com> Date: Thu, 2 Nov 2023 10:25:01 -0700 From: <alexey.pakhunov@...cex.com> To: <mchan@...adcom.com> CC: <vincent.wong2@...cex.com>, <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, <siva.kallam@...adcom.com>, <prashant@...adcom.com>, Alex Pakhunov <alexey.pakhunov@...cex.com> Subject: [PATCH v2 0/2] tg3: Fix the TX ring stall From: Alex Pakhunov <alexey.pakhunov@...cex.com> This patch fixes a problem with the tg3 driver we encountered on several machines having Broadcom 5719 NIC. The problem showed up as a 10-20 seconds interruption in network traffic and these dmegs message followed by the NIC registers dump: === dmesg === NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out ... RIP: 0010:dev_watchdog+0x21e/0x230 ... tg3 0000:02:00.2 eth0: transmit timed out, resetting === === The issue was observed with "4.15.0-52-lowlatency #56~16.04.1-Ubuntu" and "4.15.0-161-lowlatency #169~16.04.1-Ubuntu" kernels. Based on the state of the TX queue at the time of the reset and analysis of dev_watchdog() it appeared that the NIC has not been notified about packets accumulated in the TX ring for TG3_TX_TIMEOUT seconds and was reset: === dmesg === tg3 0000:02:00.2 eth0: 0: Host status block [00000001:000000a0:(0000:06d8:0000):(0000:01a0)] tg3 0000:02:00.2 eth0: 0: NAPI info [000000a0:000000a0:(0188:01a0:01ff):0000:(06f2:0000:0000:0000)] === === tnapi->hw_status->idx[0].tx_consumer is the same as tnapi->tx_cons (0x1a0) meaning that the driver has processed all TX descriptions released by the NIC. tnapi->tx_prod (0x188) is ahead of 0x1a0 meaning that there are more descriptors in the TX ring ready to be sent but the NIC does not know about that yet. Further analysis showed that tg3_start_xmit() can stop the TX queue and not tell the NIC about already enqueued packets. The specific sequence is: 1. tg3_start_xmit() is called at least once and queues packet(s) without updating tnapi->prodmbox (netdev_xmit_more() returns true) 2. tg3_start_xmit() is called with an SKB which causes tg3_tso_bug() to be called. 3. tg3_tso_bug() determines that the SKB is too large [L7860], ... if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) { ... stops the queue [L7861], and returns NETDEV_TX_BUSY [L7870]: netif_tx_stop_queue(txq); ... if (tg3_tx_avail(tnapi) <= frag_cnt_est) return NETDEV_TX_BUSY; 4. Since all tg3_tso_bug() call sites directly return, the code updating tnapi->prodmbox [L8138] is skipped. 5. The queue is stuck now. tg3_start_xmit() is not called while the queue is stopped. The NIC is not processing new packets because tnapi->prodmbox wasn't updated. tg3_tx() is not called by tg3_poll_work() because the all TX descriptions that could be freed has been freed [L7159]: /* run TX completion thread */ if (tnapi->hw_status->idx[0].tx_consumer != tnapi->tx_cons) { tg3_tx(tnapi); 6. Eventually, dev_watchdog() fires resetting the queue. As far as I can tell this sequence is still possible in HEAD of master. I could not reproduce this stall by generating traffic to match conditions required for tg3_tso_bug() to be called. Based on the driver's code the SKB must be a TSO or GSO skb; it should contain a VLAN tag or extra TCP header options; and it should be queued at exactly the right time. I believe that the last part is what makes reproducing it harder. However I was able to reproduce the stall by mimicing the behavior of tg3_tso_bug() in tg3_start_xmit(). I added the following lines to tg3_start_xmit() before "would_hit_hwbug = 0;" [L8046]: if (...) { netif_tx_stop_queue(txq); return NETDEV_TX_BUSY; } would_hit_hwbug = 0; The condition is not super relevant. It was used to control when the stall is induced, so that the network is not completely broken dueing testing. This approach reproduced the issue rather reliably. The proposed fix makes sure that the tnapi->prodmbox update happens regardless of the reason tg3_start_xmit() returned. It essentially moves the code updating tnapi->prodmbox from tg3_start_xmit() (which is renamed to __tg3_start_xmit()) to a new wrapper. This makes sure all retun paths are covered. I tested this fix with the code inducing the TX stall from above. The fix eliminated stalls completely. An aternative approch, jumping to the code updating tnapi->prodmbox after returning from tg3_tso_bug(), was considered. It yields a patch of almost the same size. There are four branches in tg3_start_xmit() that would need the goto: three tg3_tso_bug() call sites and the early return in the very beginning of tg3_start_xmit(). This seemed like a more fragile approach too since anyone modifying the function would need to be careful to preserve the invatiant of leaving it through a particular branch. Alex Pakhunov (2): tg3: Increment tx_dropped in tg3_tso_bug() tg3: Fix the TX ring stall drivers/net/ethernet/broadcom/tg3.c | 57 +++++++++++++++++++++++------ 1 file changed, 45 insertions(+), 12 deletions(-) base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa -- 2.39.3
Powered by blists - more mailing lists