netdev - Re: [PATCH] net: prefer socket bound to interface when not in VRF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Yrj0UWOQM8QNqJqu@d3>
Date:   Mon, 27 Jun 2022 09:05:37 +0900
From:   Benjamin Poirier <bpoirier@...dia.com>
To:     Mike Manning <mvrmanning@...il.com>
Cc:     Netdev <netdev@...r.kernel.org>, David Ahern <dsahern@...il.com>,
        Saikrishna Arcot <sarcot@...rosoft.com>,
        Craig Gallek <kraig@...gle.com>
Subject: Re: [PATCH] net: prefer socket bound to interface when not in VRF

On 2022-06-26 23:25 +0100, Mike Manning wrote:
> On 13/06/2022 04:14, Benjamin Poirier wrote:
> > On 2021-10-05 14:03 +0100, Mike Manning wrote:
[...]
> > Hi Mike,
> >
> > I was looking at this commit, 8d6c414cd2fb ("net: prefer socket bound to
> > interface when not in VRF"), and I get the feeling that it is only
> > partially effective. It works with UDP connected sockets but it doesn't
> > work for TCP and UDP unconnected sockets.
> >
> > The compute_score() functions are a bit misleading. Because of the
> > reuseport shortcut in their callers (inet_lhash2_lookup() and the like),
> > the first socket with score > 0 may be chosen, not necessarily the
> > socket with highest score. In order to prefer certain sockets, I think
> > an approach like commit d894ba18d4e4 ("soreuseport: fix ordering for
> > mixed v4/v6 sockets") would be needed. What do you think?
> 
> Hi Benjamin,
> 
> We had never observed any issues with any of our configurations. The VRF changes introduced
> 
> in 7e225619e8af result in a failure being returned when there is no device match, which satisfies
> 
> the requirements for VRF handling so unbound vs. bound to an l3mdev - the score is irrelevant.
> 
> However, 8d6c414cd2fb was subsequently needed as unbound and bound sockets were scored
> 
> equally, so that fix reinstated a higher score needed for sockets bound to an interface. Wrt to
> 
> your query, the scoring resolved the issue. I am unaware of any problematic use-cases, but in
> 
> any case, my changes are in line with the current approach.

The problematic use case involves sockets that have SO_REUSEPORT +
SO_BINDTODEVICE. Earlier in the thread I've included a test that
demonstrates the issue.

For the Cumulus kernel I've put in place a workaround that removes the
reuseport optimization (see below). I probably won't have time to work
on a proper upstream solution.

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index b9d995b5ce24..1765ac837358 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -253,16 +253,20 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
 		score = compute_score(sk, net, hnum, daddr, dif, sdif);
 		if (score > hiscore) {
-			result = lookup_reuseport(net, sk, skb, doff,
-						  saddr, sport, daddr, hnum);
-			if (result)
-				return result;
-
 			result = sk;
 			hiscore = score;
 		}
 	}
 
+	if (result) {
+		struct sock *reuse_sk;
+
+		reuse_sk = lookup_reuseport(net, result, skb, doff,
+					    saddr, sport, daddr, hnum);
+		if (reuse_sk)
+			result = reuse_sk;
+	}
+
 	return result;
 }
 
> > Extra info:
> > 1) fcnal-test.sh results
> >
> > I tried to reproduce the fcnal-test.sh test results quoted above but in
> > my case the test cases already pass at 8d6c414cd2fb^ and 9e9fb7655ed5.
> > Moreover I believe those test cases don't have multiple listening
> > sockets. So that just added to my confusion.
> 
> The fix was not targeting those 2 failed test cases, the output was only to show the before/after
> 
> test results. It is unclear why they failed for me with with the 9e9fb7655ed5 baseline,

Thanks for taking a look, that was also my guess.