[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7855995f-6908-4684-b5be-dbff8415843e@blackwall.org>
Date: Thu, 15 May 2025 17:36:22 +0200
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Ido Schimmel <idosch@...dia.com>, netdev@...r.kernel.org,
bridge@...ts.linux.dev
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
edumazet@...gle.com, venkat.x.venkatsubra@...cle.com, horms@...nel.org,
pablo@...filter.org, fw@...len.de
Subject: Re: [PATCH net] bridge: netfilter: Fix forwarding of fragmented
packets
On 5/15/25 10:48, Ido Schimmel wrote:
> When netfilter defrag hooks are loaded (due to the presence of conntrack
> rules, for example), fragmented packets entering the bridge will be
> defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
> ipv4_conntrack_defrag()).
>
> Later on, in the bridge's post-routing hook, the defragged packet will
> be fragmented again. If the size of the largest fragment is larger than
> what the kernel has determined as the destination MTU (using
> ip_skb_dst_mtu()), the defragged packet will be dropped.
>
> Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
> ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
> the destination MTU. Assuming the dst entry attached to the packet is
> the bridge's fake rtable one, this would simply be the bridge's MTU (see
> fake_mtu()).
>
> However, after above mentioned commit, ip_skb_dst_mtu() ends up
> returning the route's MTU stored in the dst entry's metrics. Ideally, in
> case the dst entry is the bridge's fake rtable one, this should be the
> bridge's MTU as the bridge takes care of updating this metric when its
> MTU changes (see br_change_mtu()).
>
> Unfortunately, the last operation is a no-op given the metrics attached
> to the fake rtable entry are marked as read-only. Therefore,
> ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
> defragged packets are dropped during fragmentation when dealing with
> large fragments and high MTU (e.g., 9k).
>
> Fix by moving the fake rtable entry's metrics to be per-bridge (in a
> similar fashion to the fake rtable entry itself) and marking them as
> writable, thereby allowing MTU changes to be reflected.
>
> Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
> Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
> Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>
> Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
> Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>
> Signed-off-by: Ido Schimmel <idosch@...dia.com>
> ---
> net/bridge/br_nf_core.c | 7 ++-----
> net/bridge/br_private.h | 1 +
> 2 files changed, 3 insertions(+), 5 deletions(-)
>
Acked-by: Nikolay Aleksandrov <razor@...ckwall.org>
Powered by blists - more mailing lists