lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <686fc5051bdb8_fd38829485@willemb.c.googlers.com.notmuch>
Date: Thu, 10 Jul 2025 09:49:57 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Yun Lu <luyun_611@....com>, 
 willemdebruijn.kernel@...il.com, 
 davem@...emloft.net, 
 edumazet@...gle.com, 
 kuba@...nel.org, 
 pabeni@...hat.com, 
 horms@...nel.org
Cc: netdev@...r.kernel.org, 
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/3] af_packet: fix soft lockup issue caused by
 tpacket_snd()

Yun Lu wrote:
> From: Yun Lu <luyun@...inos.cn>
> 
> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> pending_refcnt to decrement to zero before returning. The pending_refcnt
> is decremented by 1 when the skb->destructor function is called,
> indicating that the skb has been successfully sent and needs to be
> destroyed.
> 
> If an error occurs during this process, the tpacket_snd() function will
> exit and return error, but pending_refcnt may not yet have decremented to
> zero. Assuming the next send operation is executed immediately, but there
> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> returns NULL), and skb is also NULL, the function will not execute
> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> if the previous skb has completed transmission, the skb->destructor
> function can only be invoked in the ksoftirqd thread (assuming NAPI
> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> operation happen to run on the same CPU, and the CPU trapped in the
> do-while loop without yielding, the ksoftirqd thread will not get
> scheduled to run. As a result, pending_refcnt will never be reduced to
> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> lockup issue.
> 
> In fact, skb is true for all but the first iterations of that loop, and
> as long as pending_refcnt is not zero, even if incremented by a previous
> call, wait_for_completion_interruptible_timeout() should be executed to
> yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore,
> the execution condition of this function should be modified to check if
> pending_refcnt is not zero, instead of check skb.
> 
> As a result, packet_read_pending() may be called twice in the loop. This
> will be optimized in the following patch.
> 
> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> Cc: stable@...nel.org
> Suggested-by: LongJun Tang <tanglongjun@...inos.cn>
> Signed-off-by: Yun Lu <luyun@...inos.cn>
> 
> ---
> Changes in v4:
> - Split to the fix alone. Thanks: Willem de Bruijn.
> - Link to v3: https://lore.kernel.org/all/20250709095653.62469-3-luyun_611@163.com/
> 
> Changes in v3:
> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/
> 
> Changes in v2:
> - Add a Fixes tag.
> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
> ---
> ---
>  net/packet/af_packet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7089b8c2a655..581a96ec8e1a 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2846,7 +2846,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		ph = packet_current_frame(po, &po->tx_ring,
>  					  TP_STATUS_SEND_REQUEST);
>  		if (unlikely(ph == NULL)) {
> -			if (need_wait && skb) {
> +			if (need_wait && packet_read_pending(&po->tx_ring)) {

Unfortunately I did not immediately fully appreciate Eric's
suggestion.

My comments was

    If [..] the extra packet_read_pending() is already present, not
    newly introduced with the fix

But of course that expensive call is newly introduced, so my
suggestion was invalid.

It's btw also not possible to mix net and net-next patches in a single
series like this (see Documentation/process/maintainer-netdev.rst).

But, instead of going back entirely to v2, perhaps we can make the
logic a bit more obvious by just having a while (1) at the end to show
that the only way to exit the loop (except errors) is in the ph == NULL
branch. And break in that loop directly.

There are two other ways to reach that while statement. A continue
on PACKET_SOCK_TP_LOSS, or by regular control flow. In both cases, ph
is non-zero, so the condition is true anyway.

>  				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
>  				if (timeo <= 0) {
>  					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
> -- 
> 2.43.0
> 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ