[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <687166c6cbc8c_168265294ba@willemb.c.googlers.com.notmuch>
Date: Fri, 11 Jul 2025 15:32:22 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Yun Lu <luyun_611@....com>,
willemdebruijn.kernel@...il.com,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org
Cc: netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 2/2] af_packet: fix soft lockup issue caused by
tpacket_snd()
Yun Lu wrote:
> From: Yun Lu <luyun@...inos.cn>
>
> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> pending_refcnt to decrement to zero before returning. The pending_refcnt
> is decremented by 1 when the skb->destructor function is called,
> indicating that the skb has been successfully sent and needs to be
> destroyed.
>
> If an error occurs during this process, the tpacket_snd() function will
> exit and return error, but pending_refcnt may not yet have decremented to
> zero. Assuming the next send operation is executed immediately, but there
> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> returns NULL), and skb is also NULL, the function will not execute
> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> if the previous skb has completed transmission, the skb->destructor
> function can only be invoked in the ksoftirqd thread (assuming NAPI
> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> operation happen to run on the same CPU, and the CPU trapped in the
> do-while loop without yielding, the ksoftirqd thread will not get
> scheduled to run. As a result, pending_refcnt will never be reduced to
> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> lockup issue.
>
> In fact, skb is true for all but the first iterations of that loop, and
> as long as pending_refcnt is not zero, even if incremented by a previous
> call, wait_for_completion_interruptible_timeout() should be executed to
> yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore,
> the execution condition of this function should be modified to check if
> pending_refcnt is not zero, instead of check skb.
>
> - if (need_wait && skb) {
> + if (need_wait && packet_read_pending(&po->tx_ring)) {
>
> As a result, the judgment conditions are duplicated with the end code of
> the while loop, and packet_read_pending() is a very expensive function.
> Actually, this loop can only exit when ph is NULL, so the loop condition
> can be changed to while (1), and in the "ph = NULL" branch, if the
> subsequent condition of if is not met, the loop can break directly. Now,
> the loop logic remains the same as origin but is clearer and more obvious.
>
> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> Cc: stable@...nel.org
> Suggested-by: LongJun Tang <tanglongjun@...inos.cn>
> Signed-off-by: Yun Lu <luyun@...inos.cn>
Reviewed-by: Willem de Bruijn <willemb@...gle.com>
Powered by blists - more mailing lists