lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z9xmvRX_g_ZifayA@shredder>
Date: Thu, 20 Mar 2025 21:04:29 +0200
From: Ido Schimmel <idosch@...sch.org>
To: Simon Horman <horms@...nel.org>, gnault@...hat.com
Cc: Stanislav Fomichev <stfomichev@...il.com>,
	David Miller <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
	David Ahern <dsahern@...nel.org>,
	Antonio Quartulli <antonio@...delbit.com>,
	Petr Machata <petrm@...dia.com>
Subject: Re: [PATCH net v4 1/2] gre: Fix IPv6 link-local address generation.

On Thu, Mar 20, 2025 at 04:26:46PM +0000, Simon Horman wrote:
> On Mon, Mar 17, 2025 at 10:10:45PM +0100, Guillaume Nault wrote:
> > On Sun, Mar 16, 2025 at 03:08:48PM +0200, Ido Schimmel wrote:
> > > On Fri, Mar 14, 2025 at 01:18:21PM -0700, Stanislav Fomichev wrote:
> > > > On 03/14, Guillaume Nault wrote:
> > > > > On Fri, Mar 14, 2025 at 08:18:32AM -0700, Stanislav Fomichev wrote:
> > > > > > 
> > > > > > Could you please double check net/forwarding/ip6gre_custom_multipath_hash.sh ?
> > > > > > It seems like it started falling after this series has been pulled:
> > > > > > https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/31301/2-ip6gre-custom-multipath-hash-sh/stdout
> > > > > 
> > > > > Hum, net/forwarding/ip6gre_custom_multipath_hash.sh works for me on the
> > > > > current net tree (I'm at commit 4003c9e78778). I have only one failure,
> > > > > but it already happened before 183185a18ff9 ("gre: Fix IPv6 link-local
> > > > > address generation.") was applied.
> > > > 
> > > > On my side I see the following (ignore ping6 FAILs):
> > > > 
> > > > bfc6c67ec2d6 - (net-next/main, net-next/HEAD) net/smc: use the correct ndev to find pnetid by pnetid table (7 hours ago) <Guangguan Wang>
> > > > 
> > > > TAP version 13
> > > > 1..1
> > > > # timeout set to 0
> > > > # selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
> > > > [    9.275735][  T167] ip (167) used greatest stack depth: 23536 bytes left
> > > > [   13.769300][  T255] gre: GRE over IPv4 demultiplexor driver
> > > > [   13.838185][  T255] ip6_gre: GRE over IPv6 tunneling driver
> > > > [   13.951780][   T12] ip6_tunnel: g1 xmit: Local address not yet configured!
> > > > [   14.038101][   T12] ip6_tunnel: g1 xmit: Local address not yet configured!
> > > > [   15.148469][  T281] 8021q: 802.1Q VLAN Support v1.8
> > > > [   17.559477][  T321] GACT probability NOT on
> > > > [   18.551876][   T12] ip6_tunnel: g2 xmit: Local address not yet configured!
> > > > [   18.633656][   T12] ip6_tunnel: g2 xmit: Local address not yet configured!
> > > > # TEST: ping                                                          [ OK ]
> > > > # TEST: ping6                                                         [FAIL]
> > > > # INFO: Running IPv4 overlay custom multipath hash tests
> > > > # TEST: Multipath hash field: Inner source IP (balanced)              [FAIL]
> > > > #       Expected traffic to be balanced, but it is not
> > > > # INFO: Packets sent on path1 / path2: 1 / 12602
> > > > # TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
> > > > # INFO: Packets sent on path1 / path2: 0 / 12601
> > > > # TEST: Multipath hash field: Inner destination IP (balanced)         [FAIL]
> > > > #       Expected traffic to be balanced, but it is not
> > > > # INFO: Packets sent on path1 / path2: 1 / 12600
> > > > # TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
> > > > # INFO: Packets sent on path1 / path2: 0 / 12600
> > > > ...
> > > > 
> > > > 8ecea691e844 - (HEAD -> upstream/net-next/main) Revert "gre: Fix IPv6 link-local address generation." (2 minutes ago) <Stanislav Fomichev>
> > > > 
> > > > TAP version 13
> > > > 1..1
> > > > # timeout set to 0
> > > > # selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
> > > > [   13.863060][  T252] gre: GRE over IPv4 demultiplexor driver
> > > > [   13.911551][  T252] ip6_gre: GRE over IPv6 tunneling driver
> > > > [   15.226124][  T277] 8021q: 802.1Q VLAN Support v1.8
> > > > [   17.629460][  T317] GACT probability NOT on
> > > > [   17.645781][  T315] tc (315) used greatest stack depth: 23040 bytes left
> > > > # TEST: ping                                                          [ OK ]
> > > > # TEST: ping6                                                         [FAIL]
> > > > # INFO: Running IPv4 overlay custom multipath hash tests
> > > > # TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
> > > > # INFO: Packets sent on path1 / path2: 5552 / 7052
> > > > # TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
> > > > # INFO: Packets sent on path1 / path2: 12600 / 2
> > > > [   36.278056][    C2] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1078005296 wd_nsec: 1078004682
> > > > # TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
> > > > # INFO: Packets sent on path1 / path2: 6650 / 5950
> > > > # TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
> > > > # INFO: Packets sent on path1 / path2: 0 / 12600
> > > > ...
> > > > 
> > > > And I also see the failures on 4003c9e78778. Not sure why we see
> > > > different results. And the NIPAs fails as well:
> > > > 
> > > > https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/32922/1-ip6gre-custom-multipath-hash-sh/stdout
> > > 
> > > I can reproduce this locally and I'm getting the exact same result as
> > > the CI. All the balanced tests fail because the traffic is forwarded via
> > > a single nexthop. No failures after reverting 183185a18ff9.
> > > 
> > > I'm still not sure what happens, but for some reason a neighbour is not
> > > created on one of the nexthop devices which causes rt6_check_neigh() to
> > > skip over this path (returning RT6_NUD_FAIL_DO_RR). Enabling
> > > CONFIG_IPV6_ROUTER_PREF fixes the issue because then RT6_NUD_SUCCEED is
> > > returned.
> > > 
> > > I can continue looking into this on Tuesday (mostly AFK tomorrow).
> > 
> > I finally managed to reproduce the problem using vng. Still no problem
> > on my regular VM, no matter if I enable CONFIG_IPV6_ROUTER_PREF or not.
> > I'll continue investigating this problem...
> 
> FWIIW, I have tried much, but am unable to _reliably_ reproduce this problem.

Sorry for the delay. Busy with other tasks at the moment, but I found
some time to look into this. I believe I understand the issue and have a
fix. Guillaume's patch is fine. It simply exposed a bug elsewhere.

The test is failing because all the packets are forwarded via a single
path instead of being load balanced between both paths.
fib6_select_path() chooses the path according to the hash-threshold
algorithm. If the function is called with the last nexthop in a
multipath route, it will always choose this nexthop because the
calculated hash will always be smaller than the upper bound of this
nexthop.

Fix is to find the first nexthop (sibling route) and choose the first
matching nexthop according to hash-threshold. Given Guillaume and you
can reproduce the issue, can you please test the fix [1]?

I think Guillaume's patch exposed the issue because it caused the ip6gre
device to transmit a packet (Router Solicitation as part of the DAD
process for the IPv6 link-local address) as soon as the device is
brought up. With debug kernels this might happen while forwarding is
still disabled as the test enables forwarding at the end of the setup.

When forwarding is disabled the nexthop's neighbour state is taken into
account when choosing a route in rt6_select() and round-robin will be
performed between the two sibling routes. It is possible to end up in a
situation where rt6_select() always returns the second sibling route
which fib6_select_path() will then always select due to its upper bound.

[1]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fb2e99a56529..afcd66b73a92 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -412,11 +412,35 @@ static bool rt6_check_expired(const struct rt6_info *rt)
 	return false;
 }
 
+static struct fib6_info *
+rt6_multipath_first_sibling_rcu(const struct fib6_info *rt)
+{
+	struct fib6_info *iter;
+	struct fib6_node *fn;
+
+	fn = rcu_dereference(rt->fib6_node);
+	if (!fn)
+		goto out;
+	iter = rcu_dereference(fn->leaf);
+	if (!iter)
+		goto out;
+
+	while (iter) {
+		if (iter->fib6_metric == rt->fib6_metric &&
+		    rt6_qualify_for_ecmp(iter))
+			return iter;
+		iter = rcu_dereference(iter->fib6_next);
+	}
+
+out:
+	return NULL;
+}
+
 void fib6_select_path(const struct net *net, struct fib6_result *res,
 		      struct flowi6 *fl6, int oif, bool have_oif_match,
 		      const struct sk_buff *skb, int strict)
 {
-	struct fib6_info *match = res->f6i;
+	struct fib6_info *first, *match = res->f6i;
 	struct fib6_info *sibling;
 
 	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
@@ -440,10 +464,18 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
 		return;
 	}
 
-	if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound))
+	first = rt6_multipath_first_sibling_rcu(match);
+	if (!first)
 		goto out;
 
-	list_for_each_entry_rcu(sibling, &match->fib6_siblings,
+	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
+	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
+			    strict) >= 0) {
+		match = first;
+		goto out;
+	}
+
+	list_for_each_entry_rcu(sibling, &first->fib6_siblings,
 				fib6_siblings) {
 		const struct fib6_nh *nh = sibling->fib6_nh;
 		int nh_upper_bound;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ