lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aOPEYwnyGnMQCp-f@shredder>
Date: Mon, 6 Oct 2025 16:30:11 +0300
From: Ido Schimmel <idosch@...sch.org>
To: demetriousz@...ton.me
Cc: "David S. Miller" <davem@...emloft.net>,
	David Ahern <dsahern@...nel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next] net: ipv6: respect route prfsrc and fill empty
 saddr before ECMP hash

On Sun, Oct 05, 2025 at 08:49:55PM +0000, Dmitry Z via B4 Relay wrote:
> From: Dmitry Z <demetriousz@...ton.me>
> 
> In an IPv6 ECMP scenario, if a multi-homed host initiates a connection,
> `saddr` may remain empty during the initial call to `rt6_multipath_hash()`.
> It gets filled later, once the outgoing interface (OIF) is determined and
> `ipv6_dev_get_saddr()` (RFC 6724) selects the proper source address.
> 
> In some cases, this can cause the flow to switch paths: the first packets
> go via one link, while the rest of the flow is routed over another.
> 
> A practical example is a Git-over-SSH session. When running `git fetch`,
> the initial control traffic uses TOS 0x48, but data transfer switches to
> TOS 0x20. This triggers a new hash computation, and at that time `saddr`
> is already populated. As a result, packets with TOS 0x20 may be sent via
> a different OIF, because `rt6_multipath_hash()` now produces a different
> result.
> 
> This issue can happen even if the matched IPv6 route specifies a `src`
> (preferred source) address. The actual impact depends on the network
> topology. In my setup, the flow was redirected to a different switch and
> reached another host, leading to TCP RSTs from the host where the session
> was never established.
> 
> Possible workarounds:
> 1. Use netfilter to normalize the DSCP field before route lookup.
>    (breaks DSCP/TOS assignment set by the socket)
> 2. Exclude the source address from the ECMP hash via sysctl knobs.
>    (excludes an important part from hash computation)

Two more options (which I didn't test):

3. Setting "IPQoS" in SSH config to a single value. It should prevent
OpenSSH from switching DSCP while the connection is alive. Switching
DSCP triggers a route lookup since commit 305e95bb893c ("net-ipv6:
changes to ->tclass (via IPV6_TCLASS) should sk_dst_reset()"). To be
clear, I don't think this commit is problematic as there are other
events that can invalidate cached dst entries.

4. Setting "BindAddress" in SSH config. It should make sure that the
same source address is used for all route lookups.

> This patch uses the `fib6_prefsrc.addr` value from the selected route to
> populate `saddr` before ECMP hash computation, ensuring consistent path
> selection across the flow.

I'm not convinced the problem is in the kernel. As long as all the
packets are sent with the same 5-tuple, it's up to the network to
deliver them correctly. I don't know how your topology looks like, but
in the general case packets belonging to the same flow can be routed via
different paths over time. If multiple servers can service incoming SSH
connections, then there should be a stateful load balancer between them
and the clients so that packets belonging to the same flow are always
delivered to the same server. ECMP cannot be relied on to do load
balancing alone as it's stateless.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ