lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACKFLikZgXQLnSN95vQgzZqmFisrxUvKiqF78RYcMAA2BE+6Zg@mail.gmail.com>
Date: Sun, 5 Nov 2023 19:50:01 -0800
From: Michael Chan <michael.chan@...adcom.com>
To: alexey.pakhunov@...cex.com
Cc: mchan@...adcom.com, vincent.wong2@...cex.com, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, siva.kallam@...adcom.com, prashant@...adcom.com
Subject: Re: [PATCH v3] tg3: Fix the TX ring stall

On Sun, Nov 5, 2023 at 10:58 AM <alexey.pakhunov@...cex.com> wrote:
>
> From: Alex Pakhunov <alexey.pakhunov@...cex.com>
>
> The TX ring maintained by the tg3 driver can end up in the state, when it
> has packets queued for sending but the NIC hardware is not informed, so no
> progress is made. This leads to a multi-second interruption in network
> traffic followed by dev_watchdog() firing and resetting the queue.
>
> The specific sequence of steps is:
>
> 1. tg3_start_xmit() is called at least once and queues packet(s) without
>    updating tnapi->prodmbox (netdev_xmit_more() returns true)
> 2. tg3_start_xmit() is called with an SKB which causes tg3_tso_bug() to be
>    called.
> 3. tg3_tso_bug() determines that the SKB is too large, ...
>
>         if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
>
>    ... stops the queue, and returns NETDEV_TX_BUSY:
>
>         netif_tx_stop_queue(txq);
>         ...
>         if (tg3_tx_avail(tnapi) <= frag_cnt_est)
>                 return NETDEV_TX_BUSY;
>
> 4. Since all tg3_tso_bug() call sites directly return, the code updating
>    tnapi->prodmbox is skipped.
>
> 5. The queue is stuck now. tg3_start_xmit() is not called while the queue
>    is stopped. The NIC is not processing new packets because
>    tnapi->prodmbox wasn't updated. tg3_tx() is not called by
>    tg3_poll_work() because the all TX descriptions that could be freed has
>    been freed:
>
>         /* run TX completion thread */
>         if (tnapi->hw_status->idx[0].tx_consumer != tnapi->tx_cons) {
>                 tg3_tx(tnapi);
>
> 6. Eventually, dev_watchdog() fires triggering a reset of the queue.
>
> This fix makes sure that the tnapi->prodmbox update happens regardless of
> the reason tg3_start_xmit() returned.
>
> Signed-off-by: Alex Pakhunov <alexey.pakhunov@...cex.com>
> Signed-off-by: Vincent Wong <vincent.wong2@...cex.com>
> ---
> v3: Split "Fix the TX ring stall" into a standalone patch. No code changes from v2.
> v2: https://lore.kernel.org/netdev/CACKFLi=ZLAb1Y92LwvqjOGPCuinka7qbHwDP2pkG4-_a7DMorQ@mail.gmail.com/T/#t
>     - Sort the local variables in tg3_start_xmit() in the RCS order
> v1: https://lore.kernel.org/netdev/20231101191858.2611154-1-alexey.pakhunov@spacex.com/T/#t
> ---

Thanks.
Reviewed-by: Michael Chan <michael.chan@...adcom.com>

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ