[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1285084705.2617.636.camel@edumazet-laptop>
Date: Tue, 21 Sep 2010 17:58:25 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Nick Bowler <nbowler@...iptictech.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>,
Jarek Poplawski <jarkao2@...il.com>,
Patrick McHardy <kaber@...sh.net>
Subject: [PATCH v3] ip: fix truesize mismatch in ip fragmentation
Le mardi 21 septembre 2010 à 16:16 +0200, Eric Dumazet a écrit :
> Le mardi 21 septembre 2010 à 10:05 -0400, Nick Bowler a écrit :
>
> > This hunk introduces some whitespace damage.
> >
> > Anyway, I tried this with ESP on both IPv4 and IPv6 and it appears to
> > correct the issue. Thanks!
> >
>
> Indeed good catch.
>
> Here is an updated patch, I added your Tested-by
>
> Thanks for testing !
>
> [PATCH] ip : fix truesize mismatch in ip fragmentation
>
> We should not set frag->destructor to sock_wkfree() until we are sure we
> dont hit slow path in ip_fragment(). Or we risk uncharging
> frag->truesize twice, and in the end, having negative socket
> sk_wmem_alloc counter, or even freeing socket sooner than expected.
>
> Many thanks to Nick Bowler, who provided a very clean bug report and
> test program.
>
> While Nick bisection pointed to commit 2b85a34e911bf483 (net: No more
> expensive sock_hold()/sock_put() on each tx), underlying bug is older.
>
Hmm, while looking at git history, I found commit from Patrick
b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not
owned by a socket)
As my patch reverts it, we probably want a more polished patch.
(Also port Patrick work to ipv6)
So here is a V3
[PATCH v3] ip: fix truesize mismatch in ip fragmentation
Special care should be taken when slow path is hit in ip_fragment() :
When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.
Many thanks to Nick Bowler, who provided a very clean bug report and
test program.
Thanks to Jarek for reviewing my first patch and providing a V2
While Nick bisection pointed to commit 2b85a34e911 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)
A side effect is to extend work done in commit b2722b1c3a893e
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <nbowler@...iptictech.com>
Tested-by: Nick Bowler <nbowler@...iptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
CC: Jarek Poplawski <jarkao2@...il.com>
CC: Patrick McHardy <kaber@...sh.net>
---
net/ipv4/ip_output.c | 19 +++++++++++++------
net/ipv6/ip6_output.c | 18 +++++++++++++-----
2 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 04b6989..a643f7a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -488,9 +488,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
* we can switch to copy when see the first bad fragment.
*/
if (skb_has_frags(skb)) {
- struct sk_buff *frag;
+ struct sk_buff *frag, *frag2;
int first_len = skb_pagelen(skb);
- int truesizes = 0;
if (first_len - hlen > mtu ||
((first_len - hlen) & 7) ||
@@ -503,18 +502,18 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
if (frag->len > mtu ||
((frag->len & 7) && frag->next) ||
skb_headroom(frag) < hlen)
- goto slow_path;
+ goto slow_path_undo;
/* Partially cloned skb? */
if (skb_shared(frag))
- goto slow_path;
+ goto slow_path_undo;
BUG_ON(frag->sk);
if (skb->sk) {
frag->sk = skb->sk;
frag->destructor = sock_wfree;
}
- truesizes += frag->truesize;
+ skb->truesize -= frag->truesize;
}
/* Everything is OK. Generate! */
@@ -524,7 +523,6 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
frag = skb_shinfo(skb)->frag_list;
skb_frag_list_init(skb);
skb->data_len = first_len - skb_headlen(skb);
- skb->truesize -= truesizes;
skb->len = first_len;
iph->tot_len = htons(first_len);
iph->frag_off = htons(IP_MF);
@@ -576,6 +574,15 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
}
IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
return err;
+
+slow_path_undo:
+ skb_walk_frags(skb, frag2) {
+ if (frag2 == frag)
+ break;
+ frag2->sk = NULL;
+ frag2->destructor = NULL;
+ skb->truesize += frag2->truesize;
+ }
}
slow_path:
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d40b330..ca7ba44 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -639,7 +639,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
if (skb_has_frags(skb)) {
int first_len = skb_pagelen(skb);
- int truesizes = 0;
+ struct sk_buff *frag2;
if (first_len - hlen > mtu ||
((first_len - hlen) & 7) ||
@@ -651,18 +651,18 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
if (frag->len > mtu ||
((frag->len & 7) && frag->next) ||
skb_headroom(frag) < hlen)
- goto slow_path;
+ goto slow_path_undo;
/* Partially cloned skb? */
if (skb_shared(frag))
- goto slow_path;
+ goto slow_path_undo;
BUG_ON(frag->sk);
if (skb->sk) {
frag->sk = skb->sk;
frag->destructor = sock_wfree;
- truesizes += frag->truesize;
}
+ skb->truesize -= frag->truesize;
}
err = 0;
@@ -693,7 +693,6 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
first_len = skb_pagelen(skb);
skb->data_len = first_len - skb_headlen(skb);
- skb->truesize -= truesizes;
skb->len = first_len;
ipv6_hdr(skb)->payload_len = htons(first_len -
sizeof(struct ipv6hdr));
@@ -756,6 +755,15 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
IPSTATS_MIB_FRAGFAILS);
dst_release(&rt->dst);
return err;
+
+slow_path_undo:
+ skb_walk_frags(skb, frag2) {
+ if (frag2 == frag)
+ break;
+ frag2->sk = NULL;
+ frag2->destructor = NULL;
+ skb->truesize += frag2->truesize;
+ }
}
slow_path:
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists