[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130313232743.GA3686@raven>
Date: Wed, 13 Mar 2013 23:27:43 +0000
From: Tom Parkin <tparkin@...alix.com>
To: David Miller <davem@...emloft.net>
Cc: eric.dumazet@...il.com, netdev@...r.kernel.org
Subject: Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
On Thu, Mar 07, 2013 at 06:15:27PM -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Thu, 07 Mar 2013 14:47:24 -0800
>
> > On Thu, 2013-03-07 at 22:36 +0000, Tom Parkin wrote:
> >> When a fragmented IP packet is queued during device teardown, it is possible
> >> for the reassembled packet to hit the UDP rcv path with a NULL dst_entry dev
> >> pointer. Drop such packets to prevent an oops.
> >> ---
> >> net/ipv4/udp.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> >> index 0a073a2..c38a4b1 100644
> >> --- a/net/ipv4/udp.c
> >> +++ b/net/ipv4/udp.c
> >> @@ -1700,6 +1700,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
> >> return __udp4_lib_mcast_deliver(net, skb, uh,
> >> saddr, daddr, udptable);
> >>
> >> + if (skb_dst(skb)->dev == NULL)
> >> + goto drop;
> >> +
> >> sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
> >>
> >> if (sk != NULL) {
> >
> >
> > Hmm... couldnt it be tested in reassembly layer instead ?
> >
> > Why is it specific to UDP ?
>
> Furthermore, when devices are unregistered we set the route's device
> pointer to point to the loopback device, not NULL, exactly to avoid
> this kind of problem.
>
> I don't see anything in our generic DST handler nor the ipv4 specific
> route handling, that would set dst->dev to NULL.
>
> You really have to show us how this can actually happen.
I've been working to this end, and while I don't have a root cause as
yet, I do have some more information.
I think what's happening is that the dst_entry refcounting is getting
screwed up either by the ip defrag code, or by something before that
in the rcv path. What I see from ftrace debugging is that an skb
fragment ends up queued on the reassembly queue while pointing to a
dst_entry with a refcount of 0. If the dst_entry should be deleted before
the final fragment in the frame arrives, then we end up accessing
free'd memory.
So far as I can make out, the l2tp code isn't doing anything untoward
which is causing this bug. My stress test simply makes it easier to
reproduce because I'm setting up and tearing down routes and devices
a lot while passing data. I'm lucky in that my dev branch seems to
reproduce this more easily than net-next master does, although the
same oops occurs on master if you're prepared to wait around for long
enough.
This ftrace debug log snippet shows the sort of behaviour I'm seeing.
The numbers in brackets after some dst pointer values represent the
refcount for that dst:
# The dst_entry is created with a refcount of 1
<idle>-0 [000] ..s2 112.770192: dst_alloc: dst ffff880012bbb0c0, refcnt 1
# First fragment is queued
<idle>-0 [000] ..s2 112.770193: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0(1) : is fragment
<idle>-0 [000] ..s2 112.770206: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0 : fragment queued
# Second and final fragment arrives, reassemble
ip-10970 [000] ..s1 112.770678: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : is fragment
# skb_morph bumps refcount to 2, skb_consume drops it back down to 1
ip-10970 [000] ..s2 112.770682: ip_defrag: >>> clone skb ffff880010937e00 with dst ffff880012bbb0c0
ip-10970 [000] ..s2 112.770691: __copy_skb_header: don't dst_clone ffff880012bbb0c0
ip-10970 [000] ..s2 112.770691: ip_defrag: >>> morph skb ffff880010937e00 from ffff880012864600
ip-10970 [000] ..s2 112.770692: skb_release_head_state: drop skb ffff880010937e00 dst ref
ip-10970 [000] ..s2 112.770692: __copy_skb_header: cloning dst ffff880012bbb0c0 (skb ffff880012864600 -> skb ffff880010937e00)
ip-10970 [000] ..s2 112.770692: ip_defrag: >>> consume skb ffff880012864600
ip-10970 [000] ..s2 112.770693: skb_release_head_state: drop skb ffff880012864600 dst ref
ip-10970 [000] ..s2 112.770693: dst_release: dst ffff880012bbb0c0 newrefcnt 1
ip-10970 [000] ..s2 112.770698: ip_defrag: >>> coalesce loop
ip-10970 [000] ..s2 112.770698: ip_defrag: kfree_skb_partial(ffff880010937500, false)
ip-10970 [000] ..s2 112.770699: skb_release_head_state: drop skb ffff880010937500 dst ref
# skb is reassembled and delivered, dst has refcount of 1 now
ip-10970 [000] ..s1 112.770705: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : queue defragmented
# l2tp_eth uses dev_forward_skb, which calls skb_dst_drop
ip-10970 [000] ..s1 112.770707: skb_release_head_state: drop skb ffff880010937e00 dst ref
ip-10970 [000] ..s1 112.770708: dst_release: dst ffff880012bbb0c0 newrefcnt 0
# Another skb arrives; dst refcount remains at 0
<idle>-0 [000] ..s2 112.771481: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0(0) : is fragment
<idle>-0 [000] ..s2 112.771494: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0 : fragment queued
The strange thing is that once the dst refcount reaches zero, another
skb hitting ip_input doesn't bump the refcount back up. This is
partially why I'm not sure whether the error is caused by the defrag
code, or by something prior to that in the rcv path.
--
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development
Download attachment "signature.asc" of type "application/pgp-signature" (491 bytes)
Powered by blists - more mailing lists