lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 15 Mar 2013 13:38:20 +0200
From:	Timo Teras <timo.teras@....fi>
To:	netdev@...r.kernel.org
Subject: Re: linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken

On Fri, 15 Mar 2013 11:25:16 +0200
Timo Teras <timo.teras@....fi> wrote:

> On Wed, 13 Mar 2013 17:14:53 +0200
> Timo Teras <timo.teras@....fi> wrote:
> 
> > In the typical DMVPN setup with IPv4-ESP-GRE-IPv4 stack, it seems
> > that IPv4 fragmentation got broke around 3.6 for forwarded packets.
> > 
> > It would seem that fragmentation works for locally generated
> > packets. Also PMTU (DF set) seems to work for both forwarded and
> > locally generated packets. But forwarded packets to gre device that
> > gets IPsec encrypted do not get fragmented properly.
> > 
> > 3.4.x kernels work, 3.6 and 3.8 series tested and fail similarly.
> 
> Actually 3.4.x vanilla does not work. It works only with 38d523e
> "ipv4: Remove output route check in ipv4_mtu" applied which I've been
> cherry-picking to my builds.
> 
> > I was going through the changelog and it seems that MTU is now
> > handled in nexthop exceptions and one needs to produce the full
> > flow info to update it. I'm wonding if this does not hold true in
> > my code path as ip_gre rewraps the forwarded packet and creates new
> > IP header - when it next goes to the xfrm code (which sends the
> > ICMP error) the inner iphdr is no longer accessible. Would this
> > cause the breakage that I'm seeing? Or the forward flow's mtu still
> > updated somehow?
> 
> I have now a theory on what goes wrong.
> 
> My gre tunnel is configured with 'ttl 64' so the tunnel IP header
> always gets DF bit set to do proper path-mtu. The kind of locally
> generated ICMP messages I get, imply that re-fragmentation happens
> only on the tunnel's IPv4 header level - but it'll be too late then:
> the large packet is queued, IPsec'ed and it is the IPsec'ed packet
> that gets is tried to be fragmented (but it has DF set so it fails and
> packet is dropped).
> 
> I believe ip_gre should explicitly fragment the inner IPv4 and IPv6
> packets if the tunnel's ttl is not inherited (resulting in DF bit set
> in the tunnel's IPv4 header).
> 
> So basically ip_gre worked wrong all along - things just happened to
> work due to GRO/GSO not implemented in ip_gre, and the way (the now
> deleted) routing cache exposed pmtu.
> 
> Does this make sense?

Not really. Seems the fragmentation should happen already on the
earlier dst level. Though, this implies that GSO cannot be used in
ip_gre if ttl != inherit.

I added some ip_gre debugging and the following seems to happen:

- the mtu is calculated correctly on xmit path:
  dst_mtu(&rt->dst) = 1458 (the tunnel's XFRMed IPv4 path)

- skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
  is called with mtu=1430, which seems correct

- dst_mtu(skb_dst(skb)) seems to still return after above call the
  value 1472 which is wrong. so update_pmtu is not working.

- skb->dev->ifindex implies skb->dev points to gre device when
  update_pmtu is being called (and not the ethX from which the packet
  was received), so ip_rt_update_pmtu() which eventually calls
  build_skb_flow_key() is likely using wrong ifindex for the flow


- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists