[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANP3RGfuOGoB1msF1evzsgKf5qZZbNDCHDzvgPBHRGyepDuu+g@mail.gmail.com>
Date: Mon, 9 Nov 2020 11:38:16 -0800
From: Maciej Żenczykowski <maze@...gle.com>
To: Steffen Klassert <steffen.klassert@...unet.com>
Cc: Lorenzo Colitti <lorenzo@...gle.com>,
mtk81216 <lina.wang@...iatek.com>,
"David S . Miller" <davem@...emloft.net>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Jakub Kicinski <kuba@...nel.org>,
Herbert Xu <herbert@...dor.apana.org.au>,
Matthias Brugger <matthias.bgg@...il.com>,
Linux NetDev <netdev@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org,
Greg Kroah-Hartman <gregkh@...gle.com>
Subject: Re: [PATCH] xfrm:fragmented ipv4 tunnel packets in inner interface
On Mon, Nov 9, 2020 at 1:58 AM Steffen Klassert
<steffen.klassert@...unet.com> wrote:
>
> On Thu, Nov 05, 2020 at 01:52:01PM +0900, Lorenzo Colitti wrote:
> > On Tue, Sep 15, 2020 at 4:30 PM Steffen Klassert
> > <steffen.klassert@...unet.com> wrote:
> > > > In esp's tunnel mode,if inner interface is ipv4,outer is ipv4,one big
> > > > packet which travels through tunnel will be fragmented with outer
> > > > interface's mtu,peer server will remove tunnelled esp header and assemble
> > > > them in big packet.After forwarding such packet to next endpoint,it will
> > > > be dropped because of exceeding mtu or be returned ICMP(packet-too-big).
> > >
> > > What is the exact case where packets are dropped? Given that the packet
> > > was fragmented (and reassembled), I'd assume the DF bit was not set. So
> > > every router along the path is allowed to fragment again if needed.
> >
> > In general, isn't it just suboptimal to rely on fragmentation if the
> > sender already knows the packet is too big? That's why we have things
> > like path MTU discovery (RFC 1191).
>
> When we setup packets that are sent from a local socket, we take
> MTU/PMTU info we have into account. So we don't create fragments in
> that case.
>
> When forwarding packets it is different. The router that can not
> TX the packet because it exceeds the MTU of the sending interface
> is responsible to either fragment (if DF is not set), or send a
> PMTU notification (if DF is set). So if we are able to transmit
> the packet, we do it.
>
> > Fragmentation is generally
> > expensive, increases the chance of packet loss, and has historically
> > caused lots of security vulnerabilities. Also, in real world networks,
> > fragments sometimes just don't work, either because intermediate
> > routers don't fragment, or because firewalls drop the fragments due to
> > security reasons.
> >
> > While it's possible in theory to ask these operators to configure
> > their routers to fragment packets, that may not result in the network
> > being fixed, due to hardware constraints, security policy or other
> > reasons.
>
> We can not really do anything here. If a flow has no DF bit set
> on the packets, we can not rely on PMTU information. If we have PMTU
> info on the route, then we have it because some other flow (that has
> DF bit set on the packets) triggered PMTU discovery. That means that
> the PMTU information is reset when this flow (with DF set) stops
> sending packets. So the other flow (with DF not set) will send
> big packets again.
PMTU is by default ignored by forwarding - because it's spoofable.
That said I wonder if my recent changes to honour route mtu (for ipv4)
haven't fixed this particular issue in the presence of correctly
configured device/route mtus...
I don't understand if the problem here is locally generated packets,
or forwarded packets.
It does seem like there is (or was) a bug somewhere... but it might
already be fixed (see above) or might be caused by a misconfiguration
of device mtu or routing rules.
I don't really understand the example.
>
> > Those operators may also be in a position to place
> > requirements on devices that have to use their network. If the Linux
> > stack does not work as is on these networks, then those devices will
> > have to meet those requirements by making out-of-tree changes. It
> > would be good to avoid that if there's a better solution (e.g., make
> > this configurable via sysctl).
>
> We should not try to workaround broken configurations, there are just
> too many possibilities to configure a broken network.
Powered by blists - more mailing lists