[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20251005-ipv6-set-saddr-to-prefsrc-before-hash-to-stabilize-ecmp-v1-1-d43b6ef00035@proton.me>
Date: Sun, 05 Oct 2025 20:49:55 +0000
From: Dmitry Z via B4 Relay <devnull+demetriousz.proton.me@...nel.org>
To: "David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Dmitry Z <demetriousz@...ton.me>
Subject: [PATCH net-next] net: ipv6: respect route prfsrc and fill empty
saddr before ECMP hash
From: Dmitry Z <demetriousz@...ton.me>
In an IPv6 ECMP scenario, if a multi-homed host initiates a connection,
`saddr` may remain empty during the initial call to `rt6_multipath_hash()`.
It gets filled later, once the outgoing interface (OIF) is determined and
`ipv6_dev_get_saddr()` (RFC 6724) selects the proper source address.
In some cases, this can cause the flow to switch paths: the first packets
go via one link, while the rest of the flow is routed over another.
A practical example is a Git-over-SSH session. When running `git fetch`,
the initial control traffic uses TOS 0x48, but data transfer switches to
TOS 0x20. This triggers a new hash computation, and at that time `saddr`
is already populated. As a result, packets with TOS 0x20 may be sent via
a different OIF, because `rt6_multipath_hash()` now produces a different
result.
This issue can happen even if the matched IPv6 route specifies a `src`
(preferred source) address. The actual impact depends on the network
topology. In my setup, the flow was redirected to a different switch and
reached another host, leading to TCP RSTs from the host where the session
was never established.
Possible workarounds:
1. Use netfilter to normalize the DSCP field before route lookup.
(breaks DSCP/TOS assignment set by the socket)
2. Exclude the source address from the ECMP hash via sysctl knobs.
(excludes an important part from hash computation)
This patch uses the `fib6_prefsrc.addr` value from the selected route to
populate `saddr` before ECMP hash computation, ensuring consistent path
selection across the flow.
Signed-off-by: Dmitry Z <demetriousz@...ton.me>
---
net/ipv6/route.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3299cfa12e21c96ecb5c4dea5f305d5f7ce16084..d2ecf16417a6f0fc6956f0ebff3d8dea593da059 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2270,6 +2270,11 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
if (res.f6i == net->ipv6.fib6_null_entry)
goto out;
+ if (ipv6_addr_any(&fl6->saddr) &&
+ !ipv6_addr_any(&res.f6i->fib6_prefsrc.addr)) {
+ fl6->saddr = res.f6i->fib6_prefsrc.addr;
+ }
+
fib6_select_path(net, &res, fl6, oif, false, skb, strict);
/*Search through exception table */
---
base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
change-id: 20251005-ipv6-set-saddr-to-prefsrc-before-hash-to-stabilize-ecmp-6d646ec96ac4
Best regards,
--
Dmitry Z <demetriousz@...ton.me>
Powered by blists - more mailing lists