lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 15 Jul 2019 19:16:45 +0200
From:   Oliver Hartkopp <socketcan@...tkopp.net>
To:     syzbot <syzbot+0f1827363a305f74996f@...kaller.appspotmail.com>,
        davem@...emloft.net, linux-can@...r.kernel.org,
        linux-kernel@...r.kernel.org, mkl@...gutronix.de,
        netdev@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        Kirill Tkhai <ktkhai@...tuozzo.com>
Subject: Re: INFO: task hung in unregister_netdevice_notifier (3)

Hello all,

On 14.07.19 06:07, syzbot wrote:
> syzbot has found a reproducer for the following crash on:

the internal users of the CAN networking subsystem like CAN_BCM and 
CAN_RAW hold a number of CAN identifier subscriptions ('filters') for 
CAN netdevices (only type ARPHRD_CAN) in their socket data structures.

The per-socket netdevice notifier is used to manage the ad-hoc removal 
of these filters at netdevice removal time.

What I can see in the console output at

https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000

seems to be a race between an unknown register_netdevice_notifier() call 
("A") and the unregister_netdevice_notifier() ("B") likely invoked by 
bcm_release() ("C"):

[ 1047.294207][ T1049]  schedule+0xa8/0x270
[ 1047.318401][ T1049]  rwsem_down_write_slowpath+0x70a/0xf70
[ 1047.324114][ T1049]  ? downgrade_write+0x3c0/0x3c0
[ 1047.438644][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.443483][ T1049]  ? lock_acquire+0x190/0x410
[ 1047.448191][ T1049]  ? unregister_netdevice_notifier+0x7e/0x390
[ 1047.547227][ T1049]  down_write+0x13c/0x150
[ 1047.579535][ T1049]  ? down_write+0x13c/0x150
[ 1047.584106][ T1049]  ? __down_timeout+0x2d0/0x2d0
[ 1047.635356][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.640721][ T1049]  unregister_netdevice_notifier+0x7e/0x390  <- "B"
[ 1047.646667][ T1049]  ? __sock_release+0x89/0x280
[ 1047.709126][ T1049]  ? register_netdevice_notifier+0x630/0x630 <- "A"
[ 1047.715203][ T1049]  ? __kasan_check_write+0x14/0x20
[ 1047.775138][ T1049]  bcm_release+0x93/0x5e0                    <- "C"
[ 1047.795337][ T1049]  __sock_release+0xce/0x280
[ 1047.829016][ T1049]  sock_close+0x1e/0x30

The question to me is now:

Is the problem located in an (un)register_netdevice_notifier race OR is 
it generally a bad idea to call unregister_netdevice_notifier() in a 
sock release?

I've never seen that kind of problem in the wild. But if it would be the 
latter case wouldn't it be the same problem when someone unloads the 
kernel module at the 'wrong' time?

In commit 328fbe747ad46 ("net: Close race between {un, 
}register_netdevice_notifier() and setup_net()/cleanup_net()") Kirill 
Tkhai reviewed the calling site in CAN_RAW raw_release() which points to 
the same situation. Therefore added him to the recipient list.

Should down_write() be replaced with something like 
rwsem_down_write_slowpath()??

Regards,
Oliver

> HEAD commit:    a2d79c71 Merge tag 'for-5.3/io_uring-20190711' of 
> git://gi..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3539b1747f03988e
> dashboard link: 
> https://syzkaller.appspot.com/bug?extid=0f1827363a305f74996f
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1765c52fa00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f1827363a305f74996f@...kaller.appspotmail.com
> 
> INFO: task syz-executor.4:9527 blocked for more than 143 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.4  D28136  9527   9356 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.2:9528 blocked for more than 145 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.2  D28136  9528   9354 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 
> 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 
> c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.0:9529 blocked for more than 147 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.0  D28136  9529   9353 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.5:9533 blocked for more than 148 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.5  D28136  9533   9358 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 
> 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 
> c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.1:9534 blocked for more than 148 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.1  D28136  9534   9359 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.3:9535 blocked for more than 150 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.3  D28136  9535   9351 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1049:
>   #0: 00000000ede263b0 (rcu_read_lock){....}, at: 
> debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5257
> 1 lock held by rsyslogd/9208:
>   #0: 00000000da20b59a (&f->f_pos_lock){+.+.}, at: 
> __fdget_pos+0xee/0x110 fs/file.c:801
> 2 locks held by getty/9298:
>   #0: 00000000e9efae0d (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000007287a12 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9299:
>   #0: 00000000ad0733b0 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000094dd5193 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9300:
>   #0: 00000000692c340f (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 00000000538c7d7d (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9301:
>   #0: 00000000116ea6c7 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 00000000a908a9f7 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9302:
>   #0: 0000000042704f01 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000041cc8671 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9303:
>   #0: 000000001ef3b293 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 000000008b703302 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9304:
>   #0: 0000000095601bb0 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ