[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMArcTXTtJB8WUuJUumP2NHVg_c19m-6EheC3JRGxzseYmHVDw@mail.gmail.com>
Date: Tue, 7 Jan 2020 20:30:56 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: syzbot <syzbot+4ec99438ed7450da6272@...kaller.appspotmail.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: WARNING: bad unlock balance in sch_direct_xmit
On Tue, 7 Jan 2020 at 14:36, Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> Hi, Taehee
>
Hi, Cong!
Thank you for your diagnosis.
> On Sun, Jan 5, 2020 at 2:59 PM syzbot
> <syzbot+4ec99438ed7450da6272@...kaller.appspotmail.com> wrote:
> >
> > syzbot has bisected this bug to:
> >
> > commit ab92d68fc22f9afab480153bd82a20f6e2533769
> > Author: Taehee Yoo <ap420073@...il.com>
> > Date: Mon Oct 21 18:47:51 2019 +0000
> >
> > net: core: add generic lockdep keys
>
> Why netdev_update_lockdep_key() is needed?
>
> It causes the bug here because the unregister and register are not
> atomic although under RTNL, fast path could still lock with one key
> and unlock with another key after the previous got unregistered.
>
> From my understand of lockdep here, as long as the device itself
> is not changed, it doesn't need to update those keys.
>
> Thanks.
The goal of netdev_update_lockdep_key() is to avoid wrong lockdep splat.
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 02916f43bf63..cea5ef66b813 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2448,7 +2448,7 @@ static int do_set_master(struct net_device *dev,
int ifindex,
err = ops->ndo_del_slave(upper_dev, dev);
if (err)
return err;
- netdev_update_lockdep_key(dev);
+ //netdev_update_lockdep_key(dev);
} else {
return -EOPNOTSUPP;
}
Test commands:
ip link add team0 type team
ip link add team1 type team
ip link set team0 master team1
ip link set team0 nomaster
ip link set team1 master team0
Splats:
[ 105.484781][ T1178] ======================================================
[ 105.485894][ T1178] WARNING: possible circular locking dependency detected
[ 105.486791][ T1178] 5.5.0-rc2+ #264 Not tainted
[ 105.487369][ T1178] ------------------------------------------------------
[ 105.488130][ T1178] ip/1178 is trying to acquire lock:
[ 105.488948][ T1178] ffff8880521d9280
(&dev->addr_list_lock_key#4){+...}, at:
dev_uc_sync_multiple+0x95/0x120
[ 105.490336][ T1178]
[ 105.490336][ T1178] but task is already holding lock:
[ 105.491710][ T1178] ffff88806aa29280
(&dev->addr_list_lock_key#3){+...}, at: team_add_slave+0x165d/0x1972
[team]
[ 105.493471][ T1178]
[ 105.493471][ T1178] which lock already depends on the new lock.
[ 105.493471][ T1178]
[ 105.495423][ T1178]
[ 105.495423][ T1178] the existing dependency chain (in reverse order) is:
[ 105.496809][ T1178]
[ 105.496809][ T1178] -> #1 (&dev->addr_list_lock_key#3){+...}:
[ 105.497747][ T1178] _raw_spin_lock+0x30/0x70
[ 105.498201][ T1178] dev_uc_sync_multiple+0x95/0x120
[ 105.498733][ T1178] team_add_slave+0x1668/0x1972 [team]
[ 105.499293][ T1178] do_setlink+0xaab/0x2ef0
[ 105.499755][ T1178] __rtnl_newlink+0x9c5/0x1270
[ 105.500239][ T1178] rtnl_newlink+0x65/0x90
[ 105.500713][ T1178] rtnetlink_rcv_msg+0x4a8/0x890
[ 105.501269][ T1178] netlink_rcv_skb+0x121/0x350
[ 105.501799][ T1178] netlink_unicast+0x421/0x600
[ 105.502327][ T1178] netlink_sendmsg+0x65a/0xb90
[ 105.502890][ T1178] ____sys_sendmsg+0x5ce/0x7a0
[ 105.503469][ T1178] ___sys_sendmsg+0x10f/0x1b0
[ 105.504115][ T1178] __sys_sendmsg+0xc6/0x150
[ 105.504746][ T1178] do_syscall_64+0x99/0x4f0
[ 105.505391][ T1178] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.506206][ T1178]
[ 105.506206][ T1178] -> #0 (&dev->addr_list_lock_key#4){+...}:
[ 105.507238][ T1178] __lock_acquire+0x2d8d/0x3de0
[ 105.507907][ T1178] lock_acquire+0x164/0x3b0
[ 105.508536][ T1178] _raw_spin_lock+0x30/0x70
[ 105.509180][ T1178] dev_uc_sync_multiple+0x95/0x120
[ 105.509825][ T1178] team_add_slave+0x1668/0x1972 [team]
[ 105.510451][ T1178] do_setlink+0xaab/0x2ef0
[ 105.510961][ T1178] __rtnl_newlink+0x9c5/0x1270
[ 105.511525][ T1178] rtnl_newlink+0x65/0x90
[ 105.512026][ T1178] rtnetlink_rcv_msg+0x4a8/0x890
[ 105.512618][ T1178] netlink_rcv_skb+0x121/0x350
[ 105.513158][ T1178] netlink_unicast+0x421/0x600
[ 105.513843][ T1178] netlink_sendmsg+0x65a/0xb90
[ 105.514524][ T1178] ____sys_sendmsg+0x5ce/0x7a0
[ 105.515186][ T1178] ___sys_sendmsg+0x10f/0x1b0
[ 105.515852][ T1178] __sys_sendmsg+0xc6/0x150
[ 105.516493][ T1178] do_syscall_64+0x99/0x4f0
[ 105.517190][ T1178] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.518005][ T1178]
[ 105.518005][ T1178] other info that might help us debug this:
[ 105.518005][ T1178]
[ 105.519317][ T1178] Possible unsafe locking scenario:
[ 105.519317][ T1178]
[ 105.520268][ T1178] CPU0 CPU1
[ 105.520958][ T1178] ---- ----
[ 105.521640][ T1178] lock(&dev->addr_list_lock_key#3);
[ 105.522866][ T1178]
lock(&dev->addr_list_lock_key#4);
[ 105.523613][ T1178]
lock(&dev->addr_list_lock_key#3);
[ 105.524303][ T1178] lock(&dev->addr_list_lock_key#4);
[ 105.524890][ T1178]
[ 105.524890][ T1178] *** DEADLOCK ***
[ 105.524890][ T1178]
[ 105.525624][ T1178] 3 locks held by ip/1178:
[ 105.526010][ T1178] #0: ffffffffb0cc0b70 (rtnl_mutex){+.+.}, at:
rtnetlink_rcv_msg+0x457/0x890
[ 105.526864][ T1178] #1: ffff88806aa29c80
(team->team_lock_key#2){+.+.}, at: team_add_slave+0x89/0x1972 [team]
[ 105.527857][ T1178] #2: ffff88806aa29280
(&dev->addr_list_lock_key#3){+...}, at: team_add_slave+0x165d/0x1972
[team]
[ 105.528914][ T1178]
[ 105.528914][ T1178] stack backtrace:
[ 105.529505][ T1178] CPU: 3 PID: 1178 Comm: ip Not tainted 5.5.0-rc2+ #264
[ 105.530202][ T1178] Hardware name: innotek GmbH
VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 105.531417][ T1178] Call Trace:
[ 105.531824][ T1178] dump_stack+0x96/0xdb
[ 105.532333][ T1178] check_noncircular+0x371/0x450
[ 105.532963][ T1178] ? print_circular_bug.isra.36+0x310/0x310
[ 105.533575][ T1178] ? stack_trace_save+0x82/0xb0
[ 105.534058][ T1178] ? hlock_class+0x130/0x130
[ 105.534549][ T1178] ? __lock_acquire+0x2d8d/0x3de0
[ 105.535036][ T1178] __lock_acquire+0x2d8d/0x3de0
[ 105.535545][ T1178] ? register_lock_class+0x14d0/0x14d0
[ 105.536034][ T1178] ? find_held_lock+0x39/0x1d0
[ 105.536469][ T1178] lock_acquire+0x164/0x3b0
[ 105.536958][ T1178] ? dev_uc_sync_multiple+0x95/0x120
[ 105.537736][ T1178] _raw_spin_lock+0x30/0x70
[ 105.538398][ T1178] ? dev_uc_sync_multiple+0x95/0x120
[ 105.539260][ T1178] dev_uc_sync_multiple+0x95/0x120
[ 105.540213][ T1178] team_add_slave+0x1668/0x1972 [team]
[ 105.541113][ T1178] ? team_init+0x7b0/0x7b0 [team]
[ 105.541760][ T1178] ? mutex_is_locked+0x13/0x50
[ 105.542359][ T1178] ? rtnl_is_locked+0x11/0x20
[ 105.542984][ T1178] ? netdev_master_upper_dev_get+0xf/0x120
[ 105.543734][ T1178] do_setlink+0xaab/0x2ef0
[ 105.544296][ T1178] ? is_bpf_text_address+0x81/0xe0
[ ... ]
After "ip link set team0 master team1", the "team1 -> team0" locking path
will be recorded in lockdep key of both team1 and team0.
Then, if "ip link set team1 master team0" is executed, "team0 -> team1"
locking path also will be recorded in lockdep key. At this moment,
lockdep will catch possible deadlock situation and it prints the above
warning message. But, both "team0 -> team1" and "team1 -> team0"
will not be existing concurrently. so the above message is actually wrong.
In order to avoid this message, a recorded locking path should be
removed. So, both lockdep_unregister_key() and lockdep_register_key()
are needed.
Yes, netdev_update_lockdep_key() is not atomic so
"WARNING: bad unlock balance in sch_direct_xmit"
message could be printed.
Thank you so much!
Taehee Yoo
Powered by blists - more mailing lists