netdev - Re: PROBLEM: MTU of ipsec tunnel drops continuously until traffic stops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5ed871b8f02b40a8b7be29a5715d501d@svr-chch-ex1.atlnz.lc>
Date:	Thu, 21 Jul 2016 21:41:30 +0000
From:	Matt Bennett <Matt.Bennett@...iedtelesis.co.nz>
To:	Steffen Klassert <steffen.klassert@...unet.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: PROBLEM: MTU of ipsec tunnel drops continuously until traffic
 stops

On 07/21/2016 09:13 PM, Steffen Klassert wrote:
> Hi Matt,
>
> I've did some vti tests the last days, but I was unable to
> reproduce it.
>
> On Tue, Jul 19, 2016 at 05:49:06AM +0000, Matt Bennett wrote:
>> On 07/05/2016 03:55 PM, Matt Bennett wrote:
>>> On 07/04/2016 11:12 PM, Steffen Klassert wrote:
>>>> On Mon, Jul 04, 2016 at 03:52:50AM +0000, Matt Bennett wrote:
>>>>> *Resending as plain text so the mailing list accepts it.. Sorry Steffen and Herbert*
>>>>>
>>>>> Hi,
>>>>>
>>>>> During long run testing of an ipsec tunnel over a PPP link it was found that occasionally traffic would stop flowing over the tunnel. Eventually the traffic would start again, however using the command "ip route flush cache" causes traffic to start flowing  again immediately.
>
> Do you need the ppp link to reproduce it? How often does that happen?
> It would be good to find a minimal setup with that the bug is reproducible.
>
>
Our original tests were long run, i.e. we set traffic flowing across the tunnel and noticed occasionally the throughput would drop significantly. Based on my reproduction method I believe the ppp link may be required.

To reproduce this I have 2 devices:

Device 1:
ppp0 - 203.0.113.10/32 (mtu 1492)
16778240: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc htb state UP mode DEFAULT group default qlen 3
     link/ppp

tunnel64 - 172.16.0.6/30 (mtu 1200) - note this is a VTI with IPSEC protection
14: tunnel64@...E: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc htb state UNKNOWN mode DEFAULT group default qlen 1
     link/ipip 203.0.113.10 peer 203.0.113.5

Device 2:
ppp1 - 203.0.113.5/32 (mtu 1492)
16778241: ppp1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UP mode DEFAULT group default qlen 3
     link/ppp

tunnel64 - 172.16.0.5/30 (mtu 1200) - note this is a VTI with IPSEC protection
20: tunnel64@...E: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
     link/ipip 203.0.113.5 peer 203.0.113.10

I run generated traffic with size of 1300 bytes across the tunnel (which obviously fragments the packets). Then I bring ppp1 on device 2 DOWN then back UP.

At this stage on device 1 I have printk debug in the function ip_fragment(), the unlikely block is hit:

if (unlikely(!skb->ignore_df ||
		     (IPCB(skb)->frag_max_size &&
		      IPCB(skb)->frag_max_size > mtu))) {
					printk (KERN_ERR "mtu = %u, dev = %s, src = %u, dst = %u, tot_len = %u\n", mtu, skb->dev->name, iph->saddr, iph->daddr, iph->tot_len);
					printk (KERN_ERR "!skb->ignore_df = %u, IPCB(skb)->frag_max_size = %u\n", !skb->ignore_df, IPCB(skb)->frag_max_size);
					icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
			  			htonl(mtu));
					kfree_skb(skb);
					return -EMSGSIZE;
	}

which prints:
mtu = 1200, dev = tunnel64, src = 3405803786, dst = 3405803781, tot_len = 1244
!skb->ignore_df = 1, IPCB(skb)->frag_max_size = 0

Note the src and dst IP of the packet is: src=203.0.113.10, dst=203.0.113.5 (the tunnel is trying to send the PPP packet ???)

Interestingly I also have debug in icmp_unreach(), which actions the ICMP_DEST_UNREACH sent from the tunnel:

case ICMP_FRAG_NEEDED:
			/* for documentation of the ip_no_pmtu_disc
			 * values please see
			 * Documentation/networking/ip-sysctl.txt
			 */
			switch (net->ipv4.sysctl_ip_no_pmtu_disc) {
...
			case 0:
				info = ntohs(icmph->un.frag.mtu);
				printk (KERN_ERR "mtu = %u, dev = %s, src = %u, dst = %u, tot_len = %u\n", info, skb->dev->name, iph->saddr, iph->daddr, iph->tot_len);
			}

which prints:
mtu = 1200, dev = lo, src = 3405803786, dst = 3405803781, tot_len = 1244

I am confused at this stage (the packet is sent from the loopback interface and routed out the tunnel64?)

The code then eventually reaches vti4_err() which updates the pmtu on the ppp0 interface to 1200.

Then the code in xfrm_bundle_ok() which I mentioned in an earlier email is hit which continuously drops the MTU on the tunnel. However I believe the behaviour I outlined above is the root cause and this is just a side effect.