[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878qkzgi4i.fsf@cloudflare.com>
Date: Tue, 08 Jul 2025 11:13:33 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, Kuniyuki
Iwashima <kuniyu@...gle.com>
Cc: netdev@...r.kernel.org, kernel-team@...udflare.com, Lee Valentine
<lvalentine@...udflare.com>
Subject: Re: [PATCH net-next v2 1/2] tcp: Consider every port when
connecting with IP_LOCAL_PORT_RANGE
On Thu, Jul 03, 2025 at 05:59 PM +02, Jakub Sitnicki wrote:
> Situation
> ---------
>
> We observe the following scenario in production:
>
> inet_bind_bucket
> state for port 54321
> --------------------
>
> (bucket doesn't exist)
>
> // Process A opens a long-lived connection:
> s1 = socket(AF_INET, SOCK_STREAM)
> s1.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s1.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
> s1.bind(192.0.2.10, 0)
> s1.connect(192.51.100.1, 443)
> tb->reuse = -1
> tb->reuseport = -1
> s1.getsockname() -> 192.0.2.10:54321
> s1.send()
> s1.recv()
> // ... s1 stays open.
>
> // Process B opens a short-lived connection:
> s2 = socket(AF_INET, SOCK_STREAM)
> s2.setsockopt(SO_REUSEADDR)
> s2.bind(192.0.2.20, 0)
> tb->reuse = 0
> tb->reuseport = 0
> s2.connect(192.51.100.2, 53)
> s2.getsockname() -> 192.0.2.20:54321
> s2.send()
> s2.recv()
> s2.close()
>
> // bucket remains in this
> // state even though port
> // was released by s2
> tb->reuse = 0
> tb->reuseport = 0
>
> // Process A attempts to open another connection
> // when there is connection pressure from
> // 192.0.2.30:54000..54500 to 192.51.100.1:443.
> // Assume only port 54321 is still available.
>
> s3 = socket(AF_INET, SOCK_STREAM)
> s3.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s3.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
> s3.bind(192.0.2.30, 0)
> s3.connect(192.51.100.1, 443) -> EADDRNOTAVAIL (99)
>
> Problem
> -------
>
> We end up in a state where Process A can't reuse ephemeral port 54321 for
> as long as there are sockets, like s1, that keep the bind bucket alive. The
> bucket does not return to "reusable" state even when all sockets which
> blocked it from reuse, like s2, are gone.
>
> The ephemeral port becomes available for use again only after all sockets
> bound to it are gone and the bind bucket is destroyed.
>
> Programs which behave like Process B in this scenario - that is, binding to
> an IP address without setting IP_BIND_ADDRESS_NO_PORT - might be considered
> poorly written. However, the reality is that such implementation is not
> actually uncommon. Trying to fix each and every such program is like
> playing whack-a-mole.
>
> For instance, it could be any software using Golang's net.Dialer with
> LocalAddr provided:
>
> dialer := &net.Dialer{
> LocalAddr: &net.TCPAddr{IP: srcIP},
> }
> conn, err := dialer.Dial("tcp4", dialTarget)
>
> Or even a ubiquitous tool like dig when using a specific local address:
>
> $ dig -b 127.1.1.1 +tcp +short example.com
>
> Hence, we are proposing a systematic fix in the network stack itself.
>
> Solution
> --------
>
> If there is no IP address conflict with any socket bound to a given local
> port, then from the protocol's perspective, the port can be safely shared.
>
> With that in mind, modify the port search during connect(), that is
> __inet_hash_connect, to consider all bind buckets (ports) when looking for
> a local port for egress.
>
> To achieve this, add an extra walk over bhash2 buckets for the port to
> check for IP conflicts. The additional walk is not free, so perform it only
> once per port - during the second phase of conflict checking, when the
> bhash bucket is locked.
>
> We enable this changed behavior only if the IP_LOCAL_PORT_RANGE socket
> option is set. The rationale is that users are likely to care about using
> every possible local port only when they have deliberately constrained the
> ephemeral port range.
I was wondering what the maintainers' outlook on this change is:
Does this sound acceptable?
Or should we start looking in a different direction?
Thanks,
-jkbs
[...]
Powered by blists - more mailing lists