[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Yrj0UWOQM8QNqJqu@d3>
Date: Mon, 27 Jun 2022 09:05:37 +0900
From: Benjamin Poirier <bpoirier@...dia.com>
To: Mike Manning <mvrmanning@...il.com>
Cc: Netdev <netdev@...r.kernel.org>, David Ahern <dsahern@...il.com>,
Saikrishna Arcot <sarcot@...rosoft.com>,
Craig Gallek <kraig@...gle.com>
Subject: Re: [PATCH] net: prefer socket bound to interface when not in VRF
On 2022-06-26 23:25 +0100, Mike Manning wrote:
> On 13/06/2022 04:14, Benjamin Poirier wrote:
> > On 2021-10-05 14:03 +0100, Mike Manning wrote:
[...]
> > Hi Mike,
> >
> > I was looking at this commit, 8d6c414cd2fb ("net: prefer socket bound to
> > interface when not in VRF"), and I get the feeling that it is only
> > partially effective. It works with UDP connected sockets but it doesn't
> > work for TCP and UDP unconnected sockets.
> >
> > The compute_score() functions are a bit misleading. Because of the
> > reuseport shortcut in their callers (inet_lhash2_lookup() and the like),
> > the first socket with score > 0 may be chosen, not necessarily the
> > socket with highest score. In order to prefer certain sockets, I think
> > an approach like commit d894ba18d4e4 ("soreuseport: fix ordering for
> > mixed v4/v6 sockets") would be needed. What do you think?
>
> Hi Benjamin,
>
> We had never observed any issues with any of our configurations. The VRF changes introduced
>
> in 7e225619e8af result in a failure being returned when there is no device match, which satisfies
>
> the requirements for VRF handling so unbound vs. bound to an l3mdev - the score is irrelevant.
>
> However, 8d6c414cd2fb was subsequently needed as unbound and bound sockets were scored
>
> equally, so that fix reinstated a higher score needed for sockets bound to an interface. Wrt to
>
> your query, the scoring resolved the issue. I am unaware of any problematic use-cases, but in
>
> any case, my changes are in line with the current approach.
The problematic use case involves sockets that have SO_REUSEPORT +
SO_BINDTODEVICE. Earlier in the thread I've included a test that
demonstrates the issue.
For the Cumulus kernel I've put in place a workaround that removes the
reuseport optimization (see below). I probably won't have time to work
on a proper upstream solution.
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index b9d995b5ce24..1765ac837358 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -253,16 +253,20 @@ static struct sock *inet_lhash2_lookup(struct net *net,
sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
score = compute_score(sk, net, hnum, daddr, dif, sdif);
if (score > hiscore) {
- result = lookup_reuseport(net, sk, skb, doff,
- saddr, sport, daddr, hnum);
- if (result)
- return result;
-
result = sk;
hiscore = score;
}
}
+ if (result) {
+ struct sock *reuse_sk;
+
+ reuse_sk = lookup_reuseport(net, result, skb, doff,
+ saddr, sport, daddr, hnum);
+ if (reuse_sk)
+ result = reuse_sk;
+ }
+
return result;
}
> > Extra info:
> > 1) fcnal-test.sh results
> >
> > I tried to reproduce the fcnal-test.sh test results quoted above but in
> > my case the test cases already pass at 8d6c414cd2fb^ and 9e9fb7655ed5.
> > Moreover I believe those test cases don't have multiple listening
> > sockets. So that just added to my confusion.
>
> The fix was not targeting those 2 failed test cases, the output was only to show the before/after
>
> test results. It is unclear why they failed for me with with the 9e9fb7655ed5 baseline,
Thanks for taking a look, that was also my guess.
Powered by blists - more mailing lists