[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150708191751.3fc42725@vostro>
Date: Wed, 8 Jul 2015 19:17:51 +0300
From: Timo Teras <timo.teras@....fi>
To: Hannes Frederic Sowa <hannes@...essinduktion.org>
Cc: netdev@...r.kernel.org
Subject: Re: ip_forward_use_pmtu and forwarding to xfrm'ed gre
On Wed, 08 Jul 2015 17:52:32 +0200
Hannes Frederic Sowa <hannes@...essinduktion.org> wrote:
> On Wed, 2015-07-08 at 16:30 +0300, Timo Teras wrote:
> > This probably is due to the way how the xfrm+gre work together. On
> > first packet, the gre tunnel driver updates pmtu for the inner flow,
> > which is expected to be honored always. And if the 'ttl' value is
> > set for gre tunnel, no re-fragmentation is allowed as the inner flow
> > should know better. This does how the side effect that if the very
> > first packet is large, it'll be dropped to 'learn' the pmtu.
> >
> > It's probably not possible to detect this kind of target easily, as
> > the xfrm can be applied or not even on per inner target IP basis (as
> > then tunnel destination IP can be dynamic for nbma tunnels).
>
> I am currently not sure if we actually have resolved the xfrm path at
> the time we enter ip_forward, I actually thought we do. In this case
> we should be able to use skb_dst->dst->path->header_len and substract
> it before using it to fragment the packets. I hope it is so easy... :)
It is not. The inner skb just knows that it's going from ethX -> greX.
And that's what contains the path MTU, and that's what ip_forward will
use.
Only on gre_xmit it is resolved where the tunnel packet goes, and the
xfrm resolved. Thus the update_pmtu work fully internally here.
> I would actually avoid telling anyone to enable using the path mtu
> information in forwarding ever again.
The problem here is that pmtu framework is used internally to relay the
trusted stacking pmtu in addition to the from-the-wire learned pmtu.
> > So I wonder if ip_gre driver can workaround this somehow, by e.g.
> > refragmenting if necessary. Or if we just should update the sysctl's
> > help text to say that this another scenario where it needs to be
> > turned on.
>
> If above idea does not work, we could simply add an option to gre
> driver to set skb->ignore_df, but I don't like that much.
This is not acceptable. The gre driver has two operating modes: DF and
non-DF mode (which is triggered by 'ttl inherit' or 'ttl <number>'
option on tunnel creation). The DF mode intentionally sets DF on all
tunnel packets so the pmtu is learned and relayed up the stack. In
non-DF mode the tunnel packets DF is derived from encapsulated packet.
Basically this info could be used. If the target is gre1 in DF mode, we
should be trusting the pmtu. Otherwise the existing internal mechanism
breaks.
Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists