lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Sun, 24 Jan 2016 16:21:40 +0000
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Vivien Didelot <vivien.didelot@...oirfairelinux.com>,
	Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: [BUG] Adding vlan to DSA port causes lockdep splat

Adding a vlan to a DSA switch port netdev causes the following lockdep
splat on v4.4.  This was caused by:

# vconfig add lan5 2048
# ip link set lan5.2048 up

=============================================
[ INFO: possible recursive locking detected ]
4.4.0+ #41 Not tainted
---------------------------------------------
ip/1437 is trying to acquire lock:
 (_xmit_ETHER/1){+.....}, at: [<c0512190>] dev_mc_sync+0x4c/0x88

but task is already holding lock:
 (_xmit_ETHER/1){+.....}, at: [<c0512190>] dev_mc_sync+0x4c/0x88

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(_xmit_ETHER/1);
  lock(_xmit_ETHER/1);
 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by ip/1437:
 #0:  (rtnl_mutex){+.+.+.}, at: [<c051c5e8>] rtnl_lock+0x1c/0x20
 #1:  (&vlan_netdev_addr_lock_key){+.....}, at: [<c050af38>] dev_set_rx_mode+0x1c/0x30
 #2:  (_xmit_ETHER/1){+.....}, at: [<c0512190>] dev_mc_sync+0x4c/0x88

stack backtrace:
CPU: 1 PID: 1437 Comm: ip Not tainted 4.4.0+ #41
Hardware name: Marvell Armada 380/385 (Device Tree)
Backtrace:
[<c00133b4>] (dump_backtrace) from [<c00136fc>] (show_stack+0x18/0x1c)
 r6:c1126954 r5:c0a23e10 r4:00000000 r3:dc8ba600
[<c00136e4>] (show_stack) from [<c028d5c0>] (dump_stack+0x7c/0x98)
[<c028d544>] (dump_stack) from [<c00712dc>] (__lock_acquire+0x138c/0x1b98)
 r4:c0a68580 r3:ef352280
[<c006ff50>] (__lock_acquire) from [<c0071e88>] (lock_acquire+0x74/0x94)
 r10:ee9a3f10 r9:ee9b7d80 r8:00000000 r7:00000001 r6:00000001 r5:600f0013
 r4:00000000
[<c0071e14>] (lock_acquire) from [<c0658d38>] (_raw_spin_lock_nested+0x30/0x40)
 r7:ec017030 r6:ef01d178 r5:ee8a2800 r4:ef01d178
[<c0658d08>] (_raw_spin_lock_nested) from [<c0512190>] (dev_mc_sync+0x4c/0x88)
 r4:ef01d000
[<c0512144>] (dev_mc_sync) from [<c061d860>] (dsa_slave_set_rx_mode+0x28/0x38)
 r6:00000000 r5:ef01d000 r4:ee8a2800 r3:ef3e0b50
[<c061d838>] (dsa_slave_set_rx_mode) from [<c050aee4>] (__dev_set_rx_mode+0x64/0x9c)
 r5:c06b2768 r4:ee8a2800
[<c050ae80>] (__dev_set_rx_mode) from [<c05121c0>] (dev_mc_sync+0x7c/0x88)
 r6:ee8a2978 r5:00000000 r4:ee8a2800 r3:00000002
[<c0512144>] (dev_mc_sync) from [<bf134c5c>] (vlan_dev_set_rx_mode+0x1c/0x2c [8021q])
 r6:00000000 r5:bf1366d4 r4:ec017000 r3:bf134c40
[<bf134c40>] (vlan_dev_set_rx_mode [8021q]) from [<c050aee4>] (__dev_set_rx_mode+0x64/0x9c)
 r4:ec017000 r3:bf134c40
[<c050ae80>] (__dev_set_rx_mode) from [<c050af40>] (dev_set_rx_mode+0x24/0x30)
 r6:bf1366d4 r5:ec017000 r4:ec017178 r3:ef352280
[<c050af1c>] (dev_set_rx_mode) from [<c050b010>] (__dev_open+0xc4/0x108)
 r5:00000000 r4:ec017000
[<c050af4c>] (__dev_open) from [<c050b280>] (__dev_change_flags+0x94/0x150)
 r7:00001002 r6:00000001 r5:00001003 r4:ec017000
[<c050b1ec>] (__dev_change_flags) from [<c050b374>] (dev_change_flags+0x20/0x50)
 r8:00000000 r7:bf1366d4 r6:00001002 r5:0000013c r4:ec017000 r3:00000001
[<c050b354>] (dev_change_flags) from [<c051d004>] (do_setlink+0x2c8/0x76c)
 r8:00000000 r7:bf1366d4 r6:eeac3be0 r5:00000000 r4:ec017000 r3:00000001
[<c051cd3c>] (do_setlink) from [<c051e708>] (rtnl_newlink+0x464/0x700)
 r10:00000000 r9:00000000 r8:00000000 r7:eeac3ba0 r6:ee9a3f00 r5:ec017000
 r4:00000000
[<c051e2a4>] (rtnl_newlink) from [<c051e208>] (rtnetlink_rcv_msg+0x158/0x1f4)
 r10:00000000 r9:00000000 r8:eeac3d84 r7:00000000 r6:ee9b7d80 r5:00000000
 r4:ee9a3f00
[<c051e0b0>] (rtnetlink_rcv_msg) from [<c0538018>] (netlink_rcv_skb+0xb4/0xc8)
 r8:eeac3d84 r7:ee9b7d80 r6:c051e0b0 r5:ee9b7d80 r4:ee9a3f00
[<c0537f64>] (netlink_rcv_skb) from [<c051c664>] (rtnetlink_rcv+0x24/0x2c)
 r6:eda45c00 r5:00000020 r4:ee9b7d80 r3:000026fb
[<c051c640>] (rtnetlink_rcv) from [<c05379c4>] (netlink_unicast+0x198/0x1fc)
 r4:ef10c000 r3:c051c640
[<c053782c>] (netlink_unicast) from [<c0537e1c>] (netlink_sendmsg+0x348/0x368)
 r10:ee9b7d80 r8:00000000 r7:00000000 r6:00000020 r5:eda45c00 r4:eeac3f4c
[<c0537ad4>] (netlink_sendmsg) from [<c04eb68c>] (sock_sendmsg+0x1c/0x2c)
 r10:00000000 r9:00000000 r8:ec8af8c0 r7:00000000 r6:c08b74c8 r5:00000000
 r4:eeac3f4c
[<c04eb670>] (sock_sendmsg) from [<c04ec4c4>] (___sys_sendmsg+0x240/0x254)
[<c04ec284>] (___sys_sendmsg) from [<c04ed170>] (__sys_sendmsg+0x44/0x70)
 r10:00000000 r9:eeac2000 r8:c000ff04 r7:00000128 r6:00000000 r5:ec8af8c0
 r4:bedad654
[<c04ed12c>] (__sys_sendmsg) from [<c04ed1ac>] (SyS_sendmsg+0x10/0x14)
 r6:bedad640 r5:00000010 r4:0000000c
[<c04ed19c>] (SyS_sendmsg) from [<c000fd60>] (ret_fast_syscall+0x0/0x1c)


The problem seems to be centered around:

dev_set_rx_mode ->
	__dev_set_rx_mode -> vlan_dev_set_rx_mode -> dev_mc_sync ->
	__dev_set_rx_mode -> dsa_slave_set_rx_mode -> dev_mc_sync

and the lock taken in dev_mc_sync().  On the face of it, it appears
that the vlan 'nest_level' was set to 1.

SINGLE_DEPTH_NESTING is set to 1, and netif_addr_lock_nested() does:

        int subclass = SINGLE_DEPTH_NESTING;

        if (dev->netdev_ops->ndo_get_lock_subclass)
                subclass = dev->netdev_ops->ndo_get_lock_subclass(dev);

        spin_lock_nested(&dev->addr_list_lock, subclass);

This has the effect that DSA (which does not provide
ndo_get_lock_subclass) uses a subclass of '1'.  However, when vlan
calculates its nesting:

        vlan->nest_level = dev_get_nest_level(real_dev, is_vlan_dev) + 1;

is_vlan_dev() will be false for "real_dev" (that being the DSA device).
However, dev_get_nest_level() returns zero if real_dev (or any of its
parents) are not a vlan device.  Hence, the vlan device is also taken
at a subclass of '1'.

As both locks are taken with the same class/subclass, lockdep thinks
this can deadlock.

I don't think implementing what vlan does in DSA will solve this,
because I think:

	dsa->nest_level = dev_get_nest_level(parent, is_dsa_dev) + 1;

will also return 1 - as it's parent device will be the ethernet
interface attached to the switch, which will be the root of the
network device tree.

I don't see a solution to this at present.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Powered by blists - more mailing lists