[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231012234025.4025-1-nalramli@fastly.com>
Date: Thu, 12 Oct 2023 19:40:25 -0400
From: "Nabil S. Alramli" <nalramli@...tly.com>
To: sbhogavilli@...tly.com, davem@...emloft.net, dsahern@...nel.org,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: srao@...tly.com, dev@...ramli.com
Subject: [net] ipv4: Fix broken PMTUD when using L4 multipath hash
From: Suresh Bhogavilli <sbhogavilli@...tly.com>
On a node with multiple network interfaces, if we enable layer 4 hash
policy with net.ipv4.fib_multipath_hash_policy=1, path MTU discovery is
broken and TCP connection does not make progress unless the incoming
ICMP Fragmentation Needed (type 3, code 4) message is received on the
egress interface of selected nexthop of the socket.
This is because build_sk_flow_key() does not provide the sport and dport
from the socket when calling flowi4_init_output(). This appears to be a
copy/paste error of build_skb_flow_key() -> __build_flow_key() ->
flowi4_init_output() call used for packet forwarding where an skb is
present, is passed later to fib_multipath_hash() call, and can scrape
out both sport and dport from the skb if L4 hash policy is in use.
In the socket write case, fib_multipath_hash() does not get an skb so
it expects the fl4 to have sport and dport populated when L4 hashing is
in use. Not populating them results in creating a nexthop exception
entry against a nexthop that may not be the one used by the socket.
Hence it is not later matched when inet_csk_rebuild_route is called to
update the cached dst entry in the socket, so TCP does not lower its MSS
and the connection does not make progress.
Fix this by providing the source port and destination ports to
flowi4_init_output() call in build_sk_flow_key().
Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Suresh Bhogavilli <sbhogavilli@...tly.com>
---
net/ipv4/route.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e2bf4602b559..2517eb12b7ef 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -557,7 +557,8 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk)
inet_test_bit(HDRINCL, sk) ?
IPPROTO_RAW : sk->sk_protocol,
inet_sk_flowi_flags(sk),
- daddr, inet->inet_saddr, 0, 0, sk->sk_uid);
+ daddr, inet->inet_saddr, inet->inet_dport, inet->inet_sport,
+ sk->sk_uid);
rcu_read_unlock();
}
--
2.31.1
Powered by blists - more mailing lists