[<prev] [next>] [day] [month] [year] [list]
Message-ID:
<PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com>
Date: Mon, 21 Apr 2025 01:10:40 +0000
From: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>
To: "davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org"
<netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>
CC: "razor@...ckwall.org" <razor@...ckwall.org>
Subject: IP de-fragmentation failing on bridge
A brief problem description.
ping from a VM interface with mtu 9000 fails:
# ping -c 1 -s 9100 192.168.16.124
PING 192.168.16.124 (192.168.16.124) 9100(9128) bytes of data.
1 packet transmitted, 0 received, 100% packet loss
On the host they arrive as 2 fragments:
frag1 iplen 8996
frag2 iplen 152
They are passed to the bridge.
bridge-nf-call-iptables is enabled.
# cat /proc/sys/net/bridge/bridge-nf-call-iptables
1
The bridge's mtu is 9000.
# ip link show dev privnet
11: privnet: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
It needs to be forwarded over icbond0 which also has mtu 9000.
# ip link show dev icbond0
10: icbond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000
At the time of defragmentation, the bridge drops the reassembled IP packet since it finds
frag_max_size (8996) exceeding the "mtu" which it thinks is 1500.
Prior to
ac6627a2 net: ipv4: Consolidate ipv4_mtu and ip_dst_mtu_maybe_forward
the bridge was getting the "mtu" from its "fake_mtu" dst_ops.
That returns the interface mtu. 9000 in this case. Which was good for us.
But post that patch it now depends on the dst metric for RTAX_MTU.
/* 'forwarding = true' case should always honour route mtu */
mtu = dst_metric_raw(dst, RTAX_MTU);
>From dst_metric_raw we get 1500 which is the default set by the bridge.
static const u32 br_dst_default_metrics[RTAX_MAX] = {
[RTAX_MTU - 1] = 1500,
};
Since the bridge sets the metrics as read_only, this metric doesn't seem to be reflecting the true mtu.
Which is larger (9000) in our case.
Is this already a resolved issue ?
Based on the latest bridge code I couldn't find a match for a fix.
If we want to retain pre-ac6627a2 behavior,
would keeping 0 as the "fake" RTAX MTU be a viable option ?
i.e.
[RTAX_MTU - 1] = 0,
instead of
[RTAX_MTU - 1] = 1500,
Thanks for your help.
Thanks,
Venkat
Powered by blists - more mailing lists