[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250927092827.2707901-1-edumazet@google.com>
Date: Sat, 27 Sep 2025 09:28:27 +0000
From: Eric Dumazet <edumazet@...gle.com>
To: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org,
eric.dumazet@...il.com, Eric Dumazet <edumazet@...gle.com>
Subject: [PATCH net-next] tcp: use skb->len instead of skb->truesize in tcp_can_ingest()
Some applications are stuck to the 20th century and still use
small SO_RCVBUF values.
After the blamed commit, we can drop packets especially
when using LRO/hw-gro enabled NIC and small MSS (1500) values.
LRO/hw-gro NIC pack multiple segments into pages, allowing
tp->scaling_ratio to be set to a high value.
Whenever the receive queue gets full, we can receive a small packet
filling RWIN, but with a high skb->truesize, because most NIC use 4K page
plus sk_buff metadata even when receiving less than 1500 bytes of payload.
Even if we refine how tp->scaling_ratio is estimated,
we could have an issue at the start of the flow, because
the first round of packets (IW10) will be sent based on
the initial tp->scaling_ratio (1/2)
Relax tcp_can_ingest() to use skb->len instead of skb->truesize,
allowing the peer to use final RWIN, assuming a 'perfect'
scaling_ratio of 1.
Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
---
net/ipv4/tcp_input.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 79d5252ed6cc1a24ec898f4168d47c39c6e92fe1..0a2511ce34db27a512a2f4d1a12e45069a2c12d3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5086,12 +5086,23 @@ static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb);
/* Check if this incoming skb can be added to socket receive queues
* while satisfying sk->sk_rcvbuf limit.
+ *
+ * In theory we should use skb->truesize, but this can cause problems
+ * when applications use too small SO_RCVBUF values.
+ * When LRO / hw gro is used, the socket might have a high tp->scaling_ratio,
+ * allowing RWIN to be close to available space.
+ * Whenever the receive queue gets full, we can receive a small packet
+ * filling RWIN, but with a high skb->truesize, because most NIC use 4K page
+ * plus sk_buff metadata even when receiving less than 1500 bytes of payload.
+ *
+ * Note that we use skb->len to decide to accept or drop this packet,
+ * but sk->sk_rmem_alloc is the sum of all skb->truesize.
*/
static bool tcp_can_ingest(const struct sock *sk, const struct sk_buff *skb)
{
- unsigned int new_mem = atomic_read(&sk->sk_rmem_alloc) + skb->truesize;
+ unsigned int rmem = atomic_read(&sk->sk_rmem_alloc);
- return new_mem <= sk->sk_rcvbuf;
+ return rmem + skb->len <= sk->sk_rcvbuf;
}
static int tcp_try_rmem_schedule(struct sock *sk, const struct sk_buff *skb,
--
2.51.0.536.g15c5d4f767-goog
Powered by blists - more mailing lists