lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231012234025.4025-1-nalramli@fastly.com>
Date:   Thu, 12 Oct 2023 19:40:25 -0400
From:   "Nabil S. Alramli" <nalramli@...tly.com>
To:     sbhogavilli@...tly.com, davem@...emloft.net, dsahern@...nel.org,
        edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:     srao@...tly.com, dev@...ramli.com
Subject: [net] ipv4: Fix broken PMTUD when using L4 multipath hash

From: Suresh Bhogavilli <sbhogavilli@...tly.com>

On a node with multiple network interfaces, if we enable layer 4 hash
policy with net.ipv4.fib_multipath_hash_policy=1, path MTU discovery is
broken and TCP connection does not make progress unless the incoming
ICMP Fragmentation Needed (type 3, code 4) message is received on the
egress interface of selected nexthop of the socket.

This is because build_sk_flow_key() does not provide the sport and dport
from the socket when calling flowi4_init_output(). This appears to be a
copy/paste error of build_skb_flow_key() -> __build_flow_key() ->
flowi4_init_output() call used for packet forwarding where an skb is
present, is passed later to fib_multipath_hash() call, and can scrape
out both sport and dport from the skb if L4 hash policy is in use.

In the socket write case, fib_multipath_hash() does not get an skb so
it expects the fl4 to have sport and dport populated when L4 hashing is
in use. Not populating them results in creating a nexthop exception
entry against a nexthop that may not be the one used by the socket.
Hence it is not later matched when inet_csk_rebuild_route is called to
update the cached dst entry in the socket, so TCP does not lower its MSS
and the connection does not make progress.

Fix this by providing the source port and destination ports to
flowi4_init_output() call in build_sk_flow_key().

Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Suresh Bhogavilli <sbhogavilli@...tly.com>
---
 net/ipv4/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e2bf4602b559..2517eb12b7ef 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -557,7 +557,8 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk)
 			   inet_test_bit(HDRINCL, sk) ?
 				IPPROTO_RAW : sk->sk_protocol,
 			   inet_sk_flowi_flags(sk),
-			   daddr, inet->inet_saddr, 0, 0, sk->sk_uid);
+			   daddr, inet->inet_saddr, inet->inet_dport, inet->inet_sport,
+			   sk->sk_uid);
 	rcu_read_unlock();
 }
 
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ