[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1332877149.3547.24.camel@edumazet-glaptop>
Date: Tue, 27 Mar 2012 21:39:09 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Ben Greear <greearb@...delatech.com>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
gregkh@...uxfoundation.org,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: RCU lock bug in 3.0.21 (bisected to: 682cb56a, fix NULL
dereferences in check_peer_redir)
On Tue, 2012-03-27 at 09:47 -0700, Ben Greear wrote:
> On 03/26/2012 04:39 PM, Eric Dumazet wrote:
> > On Mon, 2012-03-26 at 16:06 -0700, Ben Greear wrote:
> >> On 03/26/2012 02:53 PM, Ben Greear wrote:
> >>> On 03/26/2012 02:49 PM, David Miller wrote:
> >>>>
> >>>> Looks like all of those strange undiagnosable reported Dave Jones
> >>>> has been feeding us. Something in one part of the kernel leaves
> >>>> a lock held, and this shows up as a warning elsewhere.
> >>>
> >>> Every (initial) bug printout fingers ipv6 and the 'ip' tool on my system.
> >>
> >> I added a patch to convert rcu_read_lock/unlock to macros so
> >> that I could automatically grab the call site (_THIS_IP_)
> >> and pass it into the lockdep framework instead of the (useless)
> >> _THIS_IP_ in the old rcu_read_lock method which at best seems to
> >> only indicate which module the issue relates to...
> >
> > Hi Ben
> >
> > Is this problem also appears with current tree ?
> > (This could be a problem with the backport, as it was full of
> > dependencies)
> >
> > Also, if you use a patch to better track rcu_read_lock()/unlock(), you
> > could add new macros as well to track that a particular unlock() matches
> > one given lock(). (maybe returning the rcu_preempt_depth at
> > rcu_read_lock() time , but maybe a more absolute ref would be better)
> >
> > So we could have a warning if an unlock() doesnt match the lock()
> >
> > inet6_dump_fib () was already a suspect but we could not find why.
>
>
> Ok, I tried the patch below, and got the result farther down. Is this
> what you were thinking of? (The lockdep warning about rcu lock still
> held happened immediately after this..so it appears the depth mis-match
> does represent this problem...
>
>
> [greearb@fs3 linux-3.0.dev.y]$ git diff
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 0f9b37a..ae3c7c9 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -366,6 +366,7 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
> struct hlist_node *node;
> struct hlist_head *head;
> int res = 0;
> + int depth = current->lockdep_depth;
>
> s_h = cb->args[0];
> s_e = cb->args[1];
> @@ -410,6 +411,8 @@ next:
> }
> out:
> rcu_read_unlock();
> + WARN(depth != current->lockdep_depth, "depth: %i lockdep-depth: %i\n",
> + depth, current->lockdep_depth);
> cb->args[1] = e;
> cb->args[0] = h;
>
>
>
> ------------[ cut here ]------------
> WARNING: at /home/greearb/git/linux-3.0.dev.y/net/ipv6/ip6_fib.c:415 inet6_dump_fib+0x25c/0x292 [ipv6]()
> Hardware name: To be filled by O.E.M.
> depth: 1 lockdep-depth: 2
> Modules linked in: 8021q garp stp llc fuse macvlan pktgen coretemp hwmon sunrpc ipv6 uinput arc4 ath9k snd_hda_codec_realtek mac80211 snd_hda_intel
> snd_hda_codec snd_hwdep snd_seq ath9k_common ath9k_hw snd_seq_device snd_pcm ath snd_timer e1000e cfg80211 snd mei(C) ppdev microcode i2c_i801 iTCO_wdt
> soundcore serio_raw pcspkr snd_page_alloc iTCO_vendor_support parport_pc parport i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
> Pid: 6563, comm: ip Tainted: G C 3.0.25+ #16
> Call Trace:
> [<ffffffff81046866>] warn_slowpath_common+0x80/0x98
> [<ffffffff81046912>] warn_slowpath_fmt+0x41/0x43
> [<ffffffffa0251a3a>] inet6_dump_fib+0x25c/0x292 [ipv6]
> [<ffffffff813af450>] netlink_dump+0x5b/0x19b
> [<ffffffff81385da2>] ? consume_skb+0x28/0x2a
> [<ffffffff813af7bf>] netlink_recvmsg+0x1c7/0x2f8
> [<ffffffff8137c6cf>] __sock_recvmsg_nosec+0x65/0x6e
> [<ffffffff8137dde0>] __sock_recvmsg+0x49/0x54
> [<ffffffff8137e349>] sock_recvmsg+0xa6/0xbf
> [<ffffffff81072bf8>] ? lock_release_non_nested+0x9d/0x227
> [<ffffffff810ca002>] ? might_fault+0x4e/0x9e
> [<ffffffff810ca04b>] ? might_fault+0x97/0x9e
> [<ffffffff81387cae>] ? copy_from_user+0x2a/0x2c
> [<ffffffff810ca002>] ? might_fault+0x4e/0x9e
> [<ffffffff81388080>] ? verify_iovec+0x4f/0xa3
> [<ffffffff8137e0c4>] __sys_recvmsg+0x147/0x21e
> [<ffffffff81063868>] ? up_read+0x1e/0x36
> [<ffffffff810fc9fb>] ? fcheck_files+0xb7/0xee
> [<ffffffff810fcb30>] ? fget_light+0x3b/0xbc
> [<ffffffff8137e8a0>] sys_recvmsg+0x3d/0x5b
> [<ffffffff81450e92>] system_call_fastpath+0x16/0x1b
> ---[ end trace 5232c09c4fb31d15 ]---
>
>
>
I found the bug in rt6_fill_node()
will send a patch in a couple of minutes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists