lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+h21hqu=J5RH3UkYBt7=uxWNYvXWegFsbMnf3PoWyVHTpRPrQ@mail.gmail.com>
Date:   Wed, 13 May 2020 18:56:19 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Taehee Yoo <ap420073@...il.com>
Cc:     Cong Wang <xiyou.wangcong@...il.com>,
        Netdev <netdev@...r.kernel.org>,
        syzbot <syzbot+aaa6fa4949cc5d9b7b25@...kaller.appspotmail.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>
Subject: Re: [Patch net-next v2 1/2] net: partially revert dynamic lockdep key changes

Hi Cong, Taehee,

On Mon, 4 May 2020 at 21:45, Taehee Yoo <ap420073@...il.com> wrote:
>
> On Sun, 3 May 2020 at 14:22, Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
>
> Hi Cong,
> Thank you for this work!
>
> > This patch reverts the folowing commits:
> >
> > commit 064ff66e2bef84f1153087612032b5b9eab005bd
> > "bonding: add missing netdev_update_lockdep_key()"
> >
> > commit 53d374979ef147ab51f5d632dfe20b14aebeccd0
> > "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"
> >
> > commit 1f26c0d3d24125992ab0026b0dab16c08df947c7
> > "net: fix kernel-doc warning in <linux/netdevice.h>"
> >
> > commit ab92d68fc22f9afab480153bd82a20f6e2533769
> > "net: core: add generic lockdep keys"
> >
> > but keeps the addr_list_lock_key because we still lock
> > addr_list_lock nestedly on stack devices, unlikely xmit_lock
> > this is safe because we don't take addr_list_lock on any fast
> > path.
> >
> > Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@...kaller.appspotmail.com
> > Cc: Dmitry Vyukov <dvyukov@...gle.com>
> > Cc: Taehee Yoo <ap420073@...il.com>
> > Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
>
> Acked-by: Taehee Yoo <ap420073@...il.com>
>
> Thank you,
> Taehee Yoo

I have a platform with the following layout:

      Regular NIC
       |
       +----> DSA master for switch port
               |
               +----> DSA master for another switch port

After changing DSA back to static lockdep class keys, I get this splat:

[   13.361198] ============================================
[   13.366524] WARNING: possible recursive locking detected
[   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[   13.377874] --------------------------------------------
[   13.383201] swapper/0/0 is trying to acquire lock:
[   13.388004] ffff0000668ff298
(&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at:
__dev_queue_xmit+0x84c/0xbe0
[   13.397879]
[   13.397879] but task is already holding lock:
[   13.403727] ffff0000661a1698
(&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at:
__dev_queue_xmit+0x84c/0xbe0
[   13.413593]
[   13.413593] other info that might help us debug this:
[   13.420140]  Possible unsafe locking scenario:
[   13.420140]
[   13.426075]        CPU0
[   13.428523]        ----
[   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
[   13.440924]
[   13.440924]  *** DEADLOCK ***
[   13.440924]
[   13.446860]  May be due to missing lock nesting notation
[   13.446860]
[   13.453668] 6 locks held by swapper/0/0:
[   13.457598]  #0: ffff800010003de0
((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at:
mld_sendpack+0x0/0x560
[   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2},
at: ip6_finish_output2+0x64/0xb10
[   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2},
at: __dev_queue_xmit+0x6c/0xbe0
[   13.492793]  #4: ffff0000661a1698
(&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at:
__dev_queue_xmit+0x84c/0xbe0
[   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2},
at: __dev_queue_xmit+0x6c/0xbe0
[   13.512000]
[   13.512000] stack backtrace:
[   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[   13.530421] Call trace:
[   13.532871]  dump_backtrace+0x0/0x1d8
[   13.536539]  show_stack+0x24/0x30
[   13.539862]  dump_stack+0xe8/0x150
[   13.543271]  __lock_acquire+0x1030/0x1678
[   13.547290]  lock_acquire+0xf8/0x458
[   13.550873]  _raw_spin_lock+0x44/0x58
[   13.554543]  __dev_queue_xmit+0x84c/0xbe0
[   13.558562]  dev_queue_xmit+0x24/0x30
[   13.562232]  dsa_slave_xmit+0xe0/0x128
[   13.565988]  dev_hard_start_xmit+0xf4/0x448
[   13.570182]  __dev_queue_xmit+0x808/0xbe0
[   13.574200]  dev_queue_xmit+0x24/0x30
[   13.577869]  neigh_resolve_output+0x15c/0x220
[   13.582237]  ip6_finish_output2+0x244/0xb10
[   13.586430]  __ip6_finish_output+0x1dc/0x298
[   13.590709]  ip6_output+0x84/0x358
[   13.594116]  mld_sendpack+0x2bc/0x560
[   13.597786]  mld_ifc_timer_expire+0x210/0x390
[   13.602153]  call_timer_fn+0xcc/0x400
[   13.605822]  run_timer_softirq+0x588/0x6e0
[   13.609927]  __do_softirq+0x118/0x590
[   13.613597]  irq_exit+0x13c/0x148
[   13.616918]  __handle_domain_irq+0x6c/0xc0
[   13.621023]  gic_handle_irq+0x6c/0x160
[   13.624779]  el1_irq+0xbc/0x180
[   13.627927]  cpuidle_enter_state+0xb4/0x4d0
[   13.632120]  cpuidle_enter+0x3c/0x50
[   13.635703]  call_cpuidle+0x44/0x78
[   13.639199]  do_idle+0x228/0x2c8
[   13.642433]  cpu_startup_entry+0x2c/0x48
[   13.646363]  rest_init+0x1ac/0x280
[   13.649773]  arch_call_rest_init+0x14/0x1c
[   13.653878]  start_kernel+0x490/0x4bc

Unfortunately I can't really test DSA behavior prior to patch
ab92d68fc22f ("net: core: add generic lockdep keys"), because in
October, some of these DSA drivers were not in mainline.
Also I don't really have a clear idea of how nesting should be
signalled to lockdep.
Do you have any suggestion what might be wrong?

Thanks,
-Vladimir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ