linux-kernel - Re: [syzbot] [net?] INFO: task hung in register_nexthop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240321222054.2462-1-hdanton@sina.com>
Date: Fri, 22 Mar 2024 06:20:54 +0800
From: Hillf Danton <hdanton@...a.com>
To: Antoine Tenart <atenart@...nel.org>
Cc: linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org,
	pabeni@...hat.com,
	syzkaller-bugs@...glegroups.com,
	Eric Dumazet <edumazet@...gle.com>,
	syzbot <syzbot+99b8125966713aa4b0c3@...kaller.appspotmail.com>
Subject: Re: [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)

On Thu, 21 Mar 2024 10:22:25 +0100 Antoine Tenart <atenart@...nel.org>
> Quoting Eric Dumazet (2024-03-18 15:46:37)
> > On Mon, Mar 18, 2024 at 12:26=E2=80=AFPM syzbot
> > <syzbot+99b8125966713aa4b0c3@...kaller.appspotmail.com> wrote:
> > >
> > > INFO: task syz-executor.3:6975 blocked for more than 143 seconds.
> > >       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag=
> e.
> > > task:syz-executor.3  state:D stack:20920 pid:6975  tgid:6975  ppid:1   =
>    flags:0x00004006
> > > Call Trace:
> > >  <TASK>
> > >  context_switch kernel/sched/core.c:5400 [inline]
> > >  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
> > >  __schedule_loop kernel/sched/core.c:6802 [inline]
> > >  schedule+0x149/0x260 kernel/sched/core.c:6817
> > >  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
> > >  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> > >  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
> > >  register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
> > >  nsim_fib_create+0x8a6/0xa70 drivers/net/netdevsim/fib.c:1587
> > >  nsim_drv_probe+0x747/0xb80 drivers/net/netdevsim/dev.c:1582
> > >  really_probe+0x29e/0xc50 drivers/base/dd.c:658
> > >  __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
> > >  driver_probe_device+0x50/0x430 drivers/base/dd.c:830
> > >  __device_attach_driver+0x2d6/0x530 drivers/base/dd.c:958
> > >  bus_for_each_drv+0x24e/0x2e0 drivers/base/bus.c:457
> > >  __device_attach+0x333/0x520 drivers/base/dd.c:1030
> > >  bus_probe_device+0x189/0x260 drivers/base/bus.c:532
> > >  device_add+0x8ff/0xca0 drivers/base/core.c:3639
> > >  nsim_bus_dev_new drivers/net/netdevsim/bus.c:442 [inline]
> > >  new_device_store+0x3f2/0x890 drivers/net/netdevsim/bus.c:173
> > >  kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
> >=20
> > So we have a sysfs handler ultimately calling register_nexthop_notifier()=
>  or any
> > other network control path requiring RTNL.
> >=20
> > Note that we have rtnl_trylock() for a reason...
> 
> Mentioning the below in case that gives some ideas; feel free to
> disregard.
> 
> When I looked at similar issues a while ago the rtnl deadlock actually
> happened with the kernfs_node refcount; haven't looked at this one in
> details though. The mutex in there was just preventing concurrent
> writers.
> 
> > Or maybe the reason is wrong, if we could change kernfs_fop_write_iter()
> > to no longer hold a mutex...

Better after working out why RCU stalled [1]

5 locks held by kworker/u4:7/23559:
 #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #2: ffffffff8f36d250 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x16a/0xcc0 net/core/net_namespace.c:591
 #3: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: cleanup_net+0x6af/0xcc0 net/core/net_namespace.c:627
 #4: ffffffff8e136440 (rcu_state.barrier_mutex){+.+.}-{3:3}, at: rcu_barrier+0x4c/0x550 kernel/rcu/tree.c:4064

[1] https://lore.kernel.org/lkml/0000000000009485160613eda067@google.com/

> 
> At the time I found a way to safely drop the refcount of those
> kernfs_node which then allowed to call rtnl_lock from sysfs handlers,
> https://lore.kernel.org/all/20231018154804.420823-1-atenart@kernel.org/T/
> 
> Note that this relied on how net device are unregistered (calling
> device_del under rtnl and later waiting for refs on the netdev to drop
> outside of the lock; and a few other things), so extra modifications
> would be needed to generalize the approach. Also it's a tradeoff between
> fixing those deadlocks without rtnl_trylock and maintaining a quite
> complex logic...
> 
> Antoine
>