[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250620125644.1045603-2-ptesarik@suse.com>
Date: Fri, 20 Jun 2025 14:56:43 +0200
From: Petr Tesarik <ptesarik@...e.com>
To: Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>,
Kuniyuki Iwashima <kuniyu@...gle.com>,
netdev@...r.kernel.org (open list:NETWORKING [TCP])
Cc: David Ahern <dsahern@...nel.org>,
Jakub Kicinski <kuba@...nel.org>,
linux-kernel@...r.kernel.org (open list),
Petr Tesarik <ptesarik@...e.com>
Subject: [PATCH net v2 1/2] tcp_metrics: set congestion window clamp from the dst entry
If RTAX_CWND is locked, always initialize tp->snd_cwnd_clamp from the
corresponding dst entry.
Note that an unlocked RTAX_CWND does not have any effect on the kernel.
This behavior is even documented in the manual page of ip-route(8):
cwnd NUMBER (Linux 2.3.15+ only)
the clamp for congestion window. It is ignored if the lock
flag is not used.
An unlocked RTAX_CWND was updated by tcp_update_metrics() until v3.6. Since
then, only the newly introduced TCP metric (TCP_METRIC_CWND) has been
updated, rendering unlocked RTAX_CWND useless.
TCP metrics are updated after a TCP connection finishes. If there are no
metrics for a given destination when a new connection is created, default
values are used instead.
This means there are two issues with setting tp->snd_cwnd_clamp from the
TCP metric:
1. If the cwnd option is changed in the routing table, the new value is not
used for new connections as long as there is a cached TCP metric for the
destination. An existing cached metric is not updated from the routing
table unless it has seen no update for longer than TCP_METRICS_TIMEOUT
(1 hour).
2. After evicting the corresponding cached metric, the new value from the
routing table is still not used for new connections until one connection
finishes, and a new cached entry is created.
As a result, the following shenanigan is required to set a new locked cwnd
clamp:
- update the route (``ip route replace ... cwnd lock $value``)
- flush any existing TCP metric entry (``ip tcp_metrics flush $dest``)
- create and finish a dummy connection to the destination to create a TCP
metric entry with the new value
- *next* connection to this destination will use the new value
The above does not seem to be intentional.
NB there is also an initcwnd route parameter (RTAX_INITCWND) to set the
initial size of the congestion window; this patch does not change anything
about the handling of that parameter.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Petr Tesarik <ptesarik@...e.com>
---
net/ipv4/tcp_metrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 4251670e328c..dd8f3457bd72 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
if (!dst)
goto reset;
+ if (dst_metric_locked(dst, RTAX_CWND))
+ tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
+
rcu_read_lock();
tm = tcp_get_metrics(sk, dst, false);
if (!tm) {
@@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
goto reset;
}
- if (tcp_metric_locked(tm, TCP_METRIC_CWND))
- tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
-
val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
if (val) {
--
2.49.0
Powered by blists - more mailing lists