[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3Zv9+6D0hCEx1KzvW+oAQW5oEDgSHBQPyRjHodezH9O1g@mail.gmail.com>
Date: Sun, 3 Aug 2025 12:00:05 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: edumazet@...gle.com, kuniyu@...gle.com, ncardwell@...gle.com,
davem@...emloft.net, dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com,
horms@...nel.org, shuah@...nel.org, kraig@...gle.com,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net v3 1/2] net: tcp: lookup the best matched listen socket
On Sun, Aug 3, 2025 at 10:54 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Sun, Aug 3, 2025 at 9:59 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> >
> > On Sat, Aug 2, 2025 at 9:06 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > >
> > > Hi Menglong,
> > >
> > > On Sat, Aug 2, 2025 at 5:28 PM Menglong Dong <menglong8.dong@...il.com> wrote:
> > > >
> > > > For now, the tcp socket lookup will terminate if the socket is reuse port
> > > > in inet_lhash2_lookup(), which makes the socket is not the best match.
> > > >
> > > > For example, we have socket1 and socket2 both listen on "0.0.0.0:1234",
> > > > but socket1 bind on "eth0". We create socket1 first, and then socket2.
> > > > Then, all connections will goto socket2, which is not expected, as socket1
> > > > has higher priority.
> > > >
> > > > This can cause unexpected behavior if TCP MD5 keys is used, as described
> > > > in Documentation/networking/vrf.rst -> Applications.
> > > >
> > > > Therefor, we lookup the best matched socket first, and then do the reuse
> > >
> > > s/Therefor/Therefore
> > >
> > > > port logic. This can increase some overhead if there are many reuse port
> > > > socket :/
> > > >
> > > > Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection")
> > > > Signed-off-by: Menglong Dong <dongml2@...natelecom.cn>
> > > > ---
> > > > v3:
> > > > * use the approach in V1
> > > > * add the Fixes tag
> > > > ---
> > > > net/ipv4/inet_hashtables.c | 13 +++++++------
> > > > net/ipv6/inet6_hashtables.c | 13 +++++++------
> > > > 2 files changed, 14 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> > > > index ceeeec9b7290..51751337f394 100644
> > > > --- a/net/ipv4/inet_hashtables.c
> > > > +++ b/net/ipv4/inet_hashtables.c
> > > > @@ -389,17 +389,18 @@ static struct sock *inet_lhash2_lookup(const struct net *net,
> > > > sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
> > > > score = compute_score(sk, net, hnum, daddr, dif, sdif);
> > > > if (score > hiscore) {
> > > > - result = inet_lookup_reuseport(net, sk, skb, doff,
> > > > - saddr, sport, daddr, hnum, inet_ehashfn);
> > > > - if (result)
> > > > - return result;
> > > > -
> > > > result = sk;
> > > > hiscore = score;
> > > > }
> > > > }
> > > >
> > > > - return result;
> > > > + if (!result)
> > > > + return NULL;
> > > > +
> > > > + sk = inet_lookup_reuseport(net, result, skb, doff,
> > > > + saddr, sport, daddr, hnum, inet_ehashfn);
> > > > +
> > > > + return sk ? sk : result;
> > > > }
> > >
> > > IMHO, I don't see it as a bugfix. So can you elaborate on what the exact
> > > side effect you're faced with is when the algorithm finally prefers
> > > socket2 (without
> > > this patch)?
> >
> > Hi, Jason. The case is that the user has several NIC,
> > and there are some sockets that are binded to them,
> > who listen on TCP port 6666. And a global socket doesn't
> > bind any NIC and listens on TCP port 6666.
> >
> > In theory, the connection request from the NIC will goto
> > the listen socket that is binded on it. When on socket
> > is binded on the NIC, it goto the global socket. However,
> > the connection request always goto the global socket, which
> > is not expected.
> >
> > What's worse is that when TCP MD5 is used on the socket,
> > the connection will fail :/
>
> I'm trying to picture what the usage can be in the userland as you
> pointed out in the MD5 case. As to the client side, it seems weird
> since it cannot detect and know the priority of the other side where a
> few sockets listen on the same address and port.
For the server side, it needs to add all the clients information
with the TCP_MD5SIG option. For socket1 that binded on the
eth0, it will add all the client addresses that come from eth0 to the
socket1 with TCP_MD5SIG. So the server knows the clients.
And in my use case, the TCP MD5 + VRF is used. The details are
a little fuzzy for me, which I need to do some recalling :/
>
> I'm not saying the priority problem doesn't exist, just not knowing
> how severe the case could be. It doesn't look that bad at least until
> now. Only the selection policy itself matters more to the server side
> than to the client side.
>
> Thanks,
> Jason
Powered by blists - more mailing lists