[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad8dec4c-d3e0-46c6-a943-c7f3c786c802@suse.de>
Date: Wed, 4 Feb 2026 17:25:18 +0100
From: Fernando Fernandez Mancera <fmancera@...e.de>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, corbet@....net, ncardwell@...gle.com,
kuniyu@...gle.com, dsahern@...nel.org, idosch@...dia.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
Thorsten Toepper <thorsten.toepper@....com>
Subject: Re: [PATCH RFC net-next] inet: add ip_retry_random_port sysctl to
reduce sequential port retries
On 2/3/26 6:54 PM, Fernando Fernandez Mancera wrote:
> With the current port selection algorithm, ports after a reserved port
> or long time used port are used more often than others. This combines
> with cloud environments blocking connections between the application
> server and the database server if there was a previous connection with
> the same source port. This leads to connectivity problems between
> applications on cloud environments.
>
> The situation is that a source tuple is usable again after being closed
> for a maximum lifetime segment of two minutes while in the firewall it's
> still noted as existing for 60 minutes or longer. So in case that the
> port is reused for the same target tuple before the firewall cleans up,
> the connection will fail due to firewall interference which itself will
> reset the activity timeout in its own table. We understand the real
> issue here is that these firewalls cannot cope with standards-compliant
> port reuse. But this is a workaround for such situations and an
> improvement on the distribution of ports selected.
>
> The proposed solution is instead of incrementing the port number,
> performing a re-selection of a new random port within the remaining
> range. This solution is configured via sysctl new option
> "net.ipv4.ip_retry_random_port".
>
> The test run consists of two processes, a client and a server, and loops
> connect to the server sending some bytes back. The results we got are
> promising:
>
> Executed test: Current algorithm
> ephemeral port range: 9000-65499
> simulated selections: 10000000
> retries during simulation: 14197718
> longest retry sequence: 5202
>
> Executed test: Proposed modified algorithm
> ephemeral port range: 9000-65499
> simulated selections: 10000000
> retries during simulation: 3976671
> longest retry sequence: 12
>
> In addition, on graphs generated we can observe that the distribution of
> source ports is more even with the proposed patch.
>
> Signed-off-by: Fernando Fernandez Mancera <fmancera@...e.de>
> Tested-by: Thorsten Toepper <thorsten.toepper@....com>
> ---
> .../networking/net_cachelines/netns_ipv4_sysctl.rst | 1 +
> include/net/netns/ipv4.h | 1 +
> net/ipv4/inet_hashtables.c | 7 ++++++-
> net/ipv4/sysctl_net_ipv4.c | 7 +++++++
> 4 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
> index beaf1880a19b..c4041fdca01e 100644
> --- a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
> +++ b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
> @@ -47,6 +47,7 @@ u8 sysctl_tcp_ecn
> u8 sysctl_tcp_ecn_fallback
> u8 sysctl_ip_default_ttl ip4_dst_hoplimit/ip_select_ttl
> u8 sysctl_ip_no_pmtu_disc
> +u8 sysctl_ip_retry_random_port
> u8 sysctl_ip_fwd_use_pmtu read_mostly ip_dst_mtu_maybe_forward/ip_skb_dst_mtu
> u8 sysctl_ip_fwd_update_priority ip_forward
> u8 sysctl_ip_nonlocal_bind
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 2dbd46fc4734..d04b07e7c935 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -156,6 +156,7 @@ struct netns_ipv4 {
>
> u8 sysctl_ip_default_ttl;
> u8 sysctl_ip_no_pmtu_disc;
> + u8 sysctl_ip_retry_random_port;
> u8 sysctl_ip_fwd_update_priority;
> u8 sysctl_ip_nonlocal_bind;
> u8 sysctl_ip_autobind_reuse;
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index f5826ec4bcaa..f1c79a7d3fd3 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -1088,8 +1088,13 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
> for (i = 0; i < remaining; i += step, port += step) {
> if (unlikely(port >= high))
> port -= remaining;
> - if (inet_is_local_reserved_port(net, port))
> + if (inet_is_local_reserved_port(net, port)) {
> + if (net->ipv4.sysctl_ip_retry_random_port) {
> + port = low + get_random_u32_below(remaining);
> + port = ((port & 1) == step) ? port : (port - 1);
The AI bot did a good observation
(https://netdev-ai.bots.linux.dev/ai-review.html?id=c1544ebc-4c9d-45c5-bce9-784764102912).
I think this would be better as it will keep the random scan within the
same parity when needed.
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f1c79a7d3fd3..c9650079f9e5 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -1090,8 +1090,11 @@ int __inet_hash_connect(struct
inet_timewait_death_row *death_row,
port -= remaining;
if (inet_is_local_reserved_port(net, port)) {
if (net->ipv4.sysctl_ip_retry_random_port) {
- port = low + get_random_u32_below(remaining);
- port = ((port & 1) == step) ? port : (port - 1);
+ u32 candidate = low + get_random_u32_below(remaining);
+
+ if (step == 2 && (candidate & 1) != (port & 1))
+ candidate++;
+ port = candidate;
}
continue;
}
> + }
> continue;
> + }
> head = &hinfo->bhash[inet_bhashfn(net, port,
> hinfo->bhash_size)];
> rcu_read_lock();
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index a1a50a5c80dc..5eade7d9e4a2 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -822,6 +822,13 @@ static struct ctl_table ipv4_net_table[] = {
> .mode = 0644,
> .proc_handler = ipv4_local_port_range,
> },
> + {
> + .procname = "ip_retry_random_port",
> + .maxlen = sizeof(u8),
> + .data = &init_net.ipv4.sysctl_ip_retry_random_port,
> + .mode = 0644,
> + .proc_handler = proc_dou8vec_minmax,
> + },
> {
> .procname = "ip_local_reserved_ports",
> .data = &init_net.ipv4.sysctl_local_reserved_ports,
Powered by blists - more mailing lists