[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fa753eac-3dd4-40d0-861e-3768d2ec2ddd@redhat.com>
Date: Tue, 23 Sep 2025 09:45:11 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Sitnicki <jakub@...udflare.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Kuniyuki Iwashima <kuniyu@...gle.com>, Neal Cardwell <ncardwell@...gle.com>,
kernel-team@...udflare.com, Lee Valentine <lvalentine@...udflare.com>
Subject: Re: [PATCH net-next v4 1/2] tcp: Update bind bucket state on port
release
Hi,
I'm sorry for the latency, I got lost in pending threads.
On 9/16/25 3:14 PM, Jakub Sitnicki wrote:
> On Tue, Sep 16, 2025 at 12:14 PM +02, Paolo Abeni wrote:
>> On 9/13/25 12:09 PM, Jakub Sitnicki wrote:
>>> Today, once an inet_bind_bucket enters a state where fastreuse >= 0 or
>>> fastreuseport >= 0 after a socket is explicitly bound to a port, it remains
>>> in that state until all sockets are removed and the bucket is destroyed.
>>>
>>> In this state, the bucket is skipped during ephemeral port selection in
>>> connect(). For applications using a reduced ephemeral port
>>> range (IP_LOCAL_PORT_RANGE socket option), this can cause faster port
>>> exhaustion since blocked buckets are excluded from reuse.
>>>
>>> The reason the bucket state isn't updated on port release is unclear.
>>> Possibly a performance trade-off to avoid scanning bucket owners, or just
>>> an oversight.
>>>
>>> Fix it by recalculating the bucket state when a socket releases a port. To
>>> limit overhead, each inet_bind2_bucket stores its own (fastreuse,
>>> fastreuseport) state. On port release, only the relevant port-addr bucket
>>> is scanned, and the overall state is derived from these.
>>
>> I'm possibly likely lost, but I think that the bucket state could change
>> even after inet_bhash2_update_saddr(), but AFAICS it's not updated there.
>
> Let me double check if I understand what you have in mind because now I
> also feel a bit lost :-)
>
> We already update the bucket state in inet_bhash2_update_saddr(). I
> assume we are talking about the main body, not the early bailout path
> when the socket is not bound yet [1].
>
> This code gets called only in the obscure (?) case when ip_dynaddr [2]
> sysctl is set, and we have a routing failure during connection setup
> phase (SYN-SENT).
>
> In such case, on source address update, call to
> inet_bind2_bucket_destroy() will recalculate port-addr bucket state,
> potentially "downgrading" it to (fastreuse=-1, fastreuseport=-1).
>
> But if the "downgrade" happens, it changes nothing for the port bucket
> state, as we are about to re-add the socket into another port-addr
> bucket.
This was indeed the path I was looking for. I lost track of the fact
that the port bucket affected by the removed and add is the same, so
it's state does not change.
It clear now that you pointed that out, thanks!
Paolo
Powered by blists - more mailing lists