lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 15 Mar 2013 13:38:20 +0200 From: Timo Teras <timo.teras@....fi> To: netdev@...r.kernel.org Subject: Re: linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken On Fri, 15 Mar 2013 11:25:16 +0200 Timo Teras <timo.teras@....fi> wrote: > On Wed, 13 Mar 2013 17:14:53 +0200 > Timo Teras <timo.teras@....fi> wrote: > > > In the typical DMVPN setup with IPv4-ESP-GRE-IPv4 stack, it seems > > that IPv4 fragmentation got broke around 3.6 for forwarded packets. > > > > It would seem that fragmentation works for locally generated > > packets. Also PMTU (DF set) seems to work for both forwarded and > > locally generated packets. But forwarded packets to gre device that > > gets IPsec encrypted do not get fragmented properly. > > > > 3.4.x kernels work, 3.6 and 3.8 series tested and fail similarly. > > Actually 3.4.x vanilla does not work. It works only with 38d523e > "ipv4: Remove output route check in ipv4_mtu" applied which I've been > cherry-picking to my builds. > > > I was going through the changelog and it seems that MTU is now > > handled in nexthop exceptions and one needs to produce the full > > flow info to update it. I'm wonding if this does not hold true in > > my code path as ip_gre rewraps the forwarded packet and creates new > > IP header - when it next goes to the xfrm code (which sends the > > ICMP error) the inner iphdr is no longer accessible. Would this > > cause the breakage that I'm seeing? Or the forward flow's mtu still > > updated somehow? > > I have now a theory on what goes wrong. > > My gre tunnel is configured with 'ttl 64' so the tunnel IP header > always gets DF bit set to do proper path-mtu. The kind of locally > generated ICMP messages I get, imply that re-fragmentation happens > only on the tunnel's IPv4 header level - but it'll be too late then: > the large packet is queued, IPsec'ed and it is the IPsec'ed packet > that gets is tried to be fragmented (but it has DF set so it fails and > packet is dropped). > > I believe ip_gre should explicitly fragment the inner IPv4 and IPv6 > packets if the tunnel's ttl is not inherited (resulting in DF bit set > in the tunnel's IPv4 header). > > So basically ip_gre worked wrong all along - things just happened to > work due to GRO/GSO not implemented in ip_gre, and the way (the now > deleted) routing cache exposed pmtu. > > Does this make sense? Not really. Seems the fragmentation should happen already on the earlier dst level. Though, this implies that GSO cannot be used in ip_gre if ttl != inherit. I added some ip_gre debugging and the following seems to happen: - the mtu is calculated correctly on xmit path: dst_mtu(&rt->dst) = 1458 (the tunnel's XFRMed IPv4 path) - skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu); is called with mtu=1430, which seems correct - dst_mtu(skb_dst(skb)) seems to still return after above call the value 1472 which is wrong. so update_pmtu is not working. - skb->dev->ifindex implies skb->dev points to gre device when update_pmtu is being called (and not the ethX from which the packet was received), so ip_rt_update_pmtu() which eventually calls build_skb_flow_key() is likely using wrong ifindex for the flow - Timo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists