netdev - Re: PROBLEM: MTU of ipsec tunnel drops continuously until traffic stops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bba4a160385c4a36aecdd8d385e01c69@svr-chch-ex1.atlnz.lc>
Date:	Tue, 19 Jul 2016 05:49:06 +0000
From:	Matt Bennett <Matt.Bennett@...iedtelesis.co.nz>
To:	Steffen Klassert <steffen.klassert@...unet.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: PROBLEM: MTU of ipsec tunnel drops continuously until traffic
 stops

On 07/05/2016 03:55 PM, Matt Bennett wrote:
> On 07/04/2016 11:12 PM, Steffen Klassert wrote:
>> On Mon, Jul 04, 2016 at 03:52:50AM +0000, Matt Bennett wrote:
>>> *Resending as plain text so the mailing list accepts it.. Sorry Steffen and Herbert*
>>>
>>> Hi,
>>>
>>> During long run testing of an ipsec tunnel over a PPP link it was found that occasionally traffic would stop flowing over the tunnel. Eventually the traffic would start again, however using the command "ip route flush cache" causes traffic to start flowing  again immediately.
>>>
>>> Note, I am using a 4.4.6 based kernel, however I see no major differences between 4.4.6 and 4.4.14 (current LTS) in any of the code I am debugging. I  have manually debugged the code as far as I can, however I don't know the code well enough to make further progress. What I have uncovered is outlined below:
>>>
>>> By pinging the other end of the tunnel when the traffic stops flowing I get messages like the following:
>>>
>>> 10-AR4050#ping 172.16.0.5
>>> PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
>>>   From 172.16.0.6 icmp_seq=1 Frag needed and DF set (mtu = 46)
>>>   From 172.16.0.6 icmp_seq=2 Frag needed and DF set (mtu = 46)
>>>
>>> but this is weird considering (note the mtu values):
>>>
>>> [root@...AR4050 /flash]# ip link
>>> 16778240: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc htb state UP mode DEFAULT group default qlen 3
>>>       link/ppp
>>> 14: tunnel64@...E: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc htb state UNKNOWN mode DEFAULT group default qlen 1
>>>       link/ipip 203.0.113.10 peer 203.0.113.5
>>>
>>> The code that generates the ICMP_FRAG_NEEDED packet is vti_xmit() (ip_vti.c) where there is a check of skb length against the mtu of dst entry. Since the mtu is lower than the packet (debug shows the mtu is 46 as expected from the ping output) the ICMP  error is generated.
>>
>> Semms like you use vti tunnels. Is tunnel64@...E a vti device, and
>> if so did you set the mtu to 1200?
> Yes it is a vti device with mtu manually set to 1200. Similarly the
> other end of the tunnel is a vti with mtu manually set to 1200. There is
> traffic flowing across the tunnel of random size between 512 to 1500 bytes.
>>
>> Not sure if it is related to your problem, but there was a recent
>> fix for vti pmtu handling. It was commit d6af1a31 ("vti: Add pmtu
>> handling to vti_xmit.") Do you have this on your branch?
> Yes, that problem was reported from another one of our tests. That patch
> is applied to our branch. It is the code added from that patch that
> explicitly sends the ICMP_FRAG_NEEDED packet. Before this patch I
> presume the packets would have simply been sent (even though the cached
> mtu values were buggy).
>>
>>>
>>> Digging further I find that when the issue occurs the mtu value is being updated in what appears to be an error case in xfrm_bundle_ok (xfrm_policy.c). Specifically the block of code:
>>>
>>> if (likely(!last))
>>>           return 1;
>>>
>>> is not hit meaning there is a difference between the cached mtu value and the value just calculated. I then see this code being hit continuously and each time the mtu keeps getting lowered. i.e. (I don't know if the drop by 80 bytes is significant)
>>>
>>> 1200
>>> 1118
>>> 1038
>>> 958
>>> 878
>>>    ....
>>> 46
>>
>> I remember that we had a similar problem with IPsec when no
>> vti was used some years ago...
>>
>> Unfortunately, today is my last office day before my vacation,
>> so no fix from me for the next two weeks.
>>
>
>
Hi Steffen,

I figured you must be back from your holiday soon. I haven't been able to make much progress on this issue, however I have found an interesting patch that appears like it addresses a similar issue to what I have reported (albeit for the ipv6 case).

Commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be "ipv6: Skip XFRM lookup if dst_entry in socket cache is valid" mentions "... To put it another way, the path MTU shrinks each time we miss the flow cache, which later on leads to incorrectly fragmented payload."