lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a673f379-9b0c-4d02-8884-23c62930513a@arista.com>
Date: Fri, 31 Oct 2025 10:43:36 -0700
From: Christoph Schwarz <cschwarz@...sta.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Neal Cardwell <ncardwell@...gle.com>, netdev@...r.kernel.org
Subject: Re: TCP sender stuck despite receiving ACKs from the peer



On 10/31/25 02:06, Eric Dumazet wrote:
> On Thu, Oct 23, 2025 at 10:57 PM Eric Dumazet <edumazet@...gle.com> wrote:
>>
[...]
>> Could you try the following patch ?
>>
>> Thanks again !
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 378c2d010faf251ffd874ebf0cc3dd6968eee447..8efda845611129920a9ae21d5e9dd05ffab36103
>> 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4796,6 +4796,8 @@ int __dev_queue_xmit(struct sk_buff *skb, struct
>> net_device *sb_dev)
>>                   * to -1 or to their cpu id, but not to our id.
>>                   */
>>                  if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
>> +                       struct sk_buff *orig;
>> +
>>                          if (dev_xmit_recursion())
>>                                  goto recursion_alert;
>>
>> @@ -4805,6 +4807,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct
>> net_device *sb_dev)
>>
>>                          HARD_TX_LOCK(dev, txq, cpu);
>>
>> +                       orig = skb;
>>                          if (!netif_xmit_stopped(txq)) {
>>                                  dev_xmit_recursion_inc();
>>                                  skb = dev_hard_start_xmit(skb, dev, txq, &rc);
>> @@ -4817,6 +4820,11 @@ int __dev_queue_xmit(struct sk_buff *skb,
>> struct net_device *sb_dev)
>>                          HARD_TX_UNLOCK(dev, txq);
>>                          net_crit_ratelimited("Virtual device %s asks
>> to queue packet!\n",
>>                                               dev->name);
>> +                       if (skb != orig) {
>> +                               /* If at least one packet was sent, we
>> must return NETDEV_TX_OK */
>> +                               rc = NETDEV_TX_OK;
>> +                               goto unlock;
>> +                       }
>>                  } else {
>>                          /* Recursion is detected! It is possible,
>>                           * unfortunately
>> @@ -4828,6 +4836,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct
>> net_device *sb_dev)
>>          }
>>
>>          rc = -ENETDOWN;
>> +unlock:
>>          rcu_read_unlock_bh();
>>
>>          dev_core_stats_tx_dropped_inc(dev);
> 
> Hi Christoph
> 
> Any progress on your side ?
> 
> Thanks.

Hi Eric,

Thanks for your help. This is much appreciated.

We tried your patch but unfortunately it did not help. We have some 
ideas why that is. Here is what we figured out:

It is very likely that device stacking as described in my previous mail 
is a factor.

49: vlan0@...ent: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
      link/ether 02:1c:a7:00:00:01 brd ff:ff:ff:ff:ff:ff
3: parent: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 10000 qdisc prio state
UNKNOWN mode DEFAULT group default qlen 1000
      link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

The "parent" device is served by a proprietary device driver for a 
switch ASIC, and implements TX flow control, with the TX queue being 
stopped frequently. It does not have TSO capabilities. We could look 
into adding that, but as of now it is not an option.

The "vlan0" device stacked on top is Linux kernel code 
(net/8021q/vlan_dev.c) and has the IP address to which the HTTP server 
binds. However, its TX queue never stops.

So now it can get into this situation where the TX queue on the 
underlying device is stopped, but on the stacked vlan0 device it is not. 
In this situation, we see return codes of NET_XMIT_DROP (1).

Which means it never reaches the code that you patched in, because 
thanks to rc=1, dev_xmit_complete is always true so it goes to out. And 
because the TX queue on vlan0 is never stopped, it always enters the 
"!netif_xmit_stopped(txq)" block and never skips over it, again 
preventing the new code from ever being executed.

if (!netif_xmit_stopped(txq)) {
	dev_xmit_recursion_inc();
	skb = dev_hard_start_xmit(skb, dev, txq, &rc);
	dev_xmit_recursion_dec();
	if (dev_xmit_complete(rc)) {
		HARD_TX_UNLOCK(dev, txq);
		goto out;
	}
}
HARD_TX_UNLOCK(dev, txq);
net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
		     dev->name);
if (skb != orig) {
	/* If at least one packet was sent, we must return NETDEV_TX_OK */
	rc = NETDEV_TX_OK;
	goto unlock;
}

I think for your patch to work we would need to see a NETDEV_TX_BUSY 
(0x10) rc from dev_hard_start_xmit, but that does not seem to happen, 
maybe due to the device stacking?

best regards,
Chris


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ