netdev - Re: Occasional oops with IPSec and IPv6.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20111118.151054.669492198256098000.davem@davemloft.net>
Date:	Fri, 18 Nov 2011 15:10:54 -0500 (EST)
From:	David Miller <davem@...emloft.net>
To:	timo.teras@....fi
Cc:	nbowler@...iptictech.com, eric.dumazet@...il.com,
	netdev@...r.kernel.org
Subject: Re: Occasional oops with IPSec and IPv6.

From: Timo Teräs <timo.teras@....fi>
Date: Fri, 18 Nov 2011 22:06:47 +0200

> I'm not too familiar with the relevant IPv6 code, but it seems to be
> mostly modelled after the IPv4 side. Looking at the back trace offset
> inside ipv6_fragment, I'd say it was taking the "fast path" for
> constructing the fragments. So first guess is that the headroom check
> for allowing fast path to happen is not right.
> 
> Since the code seems to be treating separately hlen and struct frag_hdr,
> I'm wondering if the following patch would be in place?
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 1c9bf8b..c35d9fc 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -675,7 +675,7 @@ int ip6_fragment(struct sk_buff *skb, int
> (*output)(struct sk_buff *))
>  			/* Correct geometry. */
>  			if (frag->len > mtu ||
>  			    ((frag->len & 7) && frag->next) ||
> -			    skb_headroom(frag) < hlen)
> +			    skb_headroom(frag) < hlen + sizeof(struct frag_hdr))
>  				goto slow_path_clean;
> 
>  			/* Partially cloned skb? */
> 
> 
> Alternatively, we could just run the "slow path" unconditionally with
> the test load to see if it fixes the issue. At least that'd be pretty
> good test if it's a problem in the ipv6 fragmentation code or something
> else.

This reminds me of the following change from Steffen Klassert in net-next:

commit 299b0767642a65f0c5446ab6d35e6df0daf43d33
Author: Steffen Klassert <steffen.klassert@...unet.com>
Date:   Tue Oct 11 01:43:33 2011 +0000

    ipv6: Fix IPsec slowpath fragmentation problem
    
    ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
    On IPsec the effective mtu is lower because we need to add the protocol
    headers and trailers later when we do the IPsec transformations. So after
    the IPsec transformations the packet might be too big, which leads to a
    slowpath fragmentation then. This patch fixes this by building the packets
    based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
    handling to this.
    
    Signed-off-by: Steffen Klassert <steffen.klassert@...unet.com>
    Signed-off-by: David S. Miller <davem@...emloft.net>

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 835c04b..1e20b64 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1193,6 +1193,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	struct sk_buff *skb;
 	unsigned int maxfraglen, fragheaderlen;
 	int exthdrlen;
+	int dst_exthdrlen;
 	int hh_len;
 	int mtu;
 	int copy;
@@ -1248,7 +1249,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 		np->cork.hop_limit = hlimit;
 		np->cork.tclass = tclass;
 		mtu = np->pmtudisc == IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+		      rt->dst.dev->mtu : dst_mtu(&rt->dst);
 		if (np->frag_size < mtu) {
 			if (np->frag_size)
 				mtu = np->frag_size;
@@ -1259,16 +1260,17 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 		cork->length = 0;
 		sk->sk_sndmsg_page = NULL;
 		sk->sk_sndmsg_off = 0;
-		exthdrlen = rt->dst.header_len + (opt ? opt->opt_flen : 0) -
-			    rt->rt6i_nfheader_len;
+		exthdrlen = (opt ? opt->opt_flen : 0) - rt->rt6i_nfheader_len;
 		length += exthdrlen;
 		transhdrlen += exthdrlen;
+		dst_exthdrlen = rt->dst.header_len;
 	} else {
 		rt = (struct rt6_info *)cork->dst;
 		fl6 = &inet->cork.fl.u.ip6;
 		opt = np->cork.opt;
 		transhdrlen = 0;
 		exthdrlen = 0;
+		dst_exthdrlen = 0;
 		mtu = cork->fragsize;
 	}
 
@@ -1368,6 +1370,8 @@ alloc_new_skb:
 			else
 				alloclen = datalen + fragheaderlen;
 
+			alloclen += dst_exthdrlen;
+
 			/*
 			 * The last fragment gets additional space at tail.
 			 * Note: we overallocate on fragments with MSG_MODE
@@ -1419,9 +1423,9 @@ alloc_new_skb:
 			/*
 			 *	Find where to start putting bytes
 			 */
-			data = skb_put(skb, fraglen);
-			skb_set_network_header(skb, exthdrlen);
-			data += fragheaderlen;
+			data = skb_put(skb, fraglen + dst_exthdrlen);
+			skb_set_network_header(skb, exthdrlen + dst_exthdrlen);
+			data += fragheaderlen + dst_exthdrlen;
 			skb->transport_header = (skb->network_header +
 						 fragheaderlen);
 			if (fraggap) {
@@ -1434,6 +1438,7 @@ alloc_new_skb:
 				pskb_trim_unique(skb_prev, maxfraglen);
 			}
 			copy = datalen - transhdrlen - fraggap;
+
 			if (copy < 0) {
 				err = -EINVAL;
 				kfree_skb(skb);
@@ -1448,6 +1453,7 @@ alloc_new_skb:
 			length -= datalen - fraggap;
 			transhdrlen = 0;
 			exthdrlen = 0;
+			dst_exthdrlen = 0;
 			csummode = CHECKSUM_NONE;
 
 			/*
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 3486f62..6f7824e 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -542,8 +542,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 		goto out;
 
 	offset = rp->offset;
-	total_len = inet_sk(sk)->cork.base.length - (skb_network_header(skb) -
-						     skb->data);
+	total_len = inet_sk(sk)->cork.base.length;
 	if (offset >= total_len - 1) {
 		err = -EINVAL;
 		ip6_flush_pending_frames(sk);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html