[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <188f255ca50e0e7a46e0fd139982e6ee3652bd7f.camel@redhat.com>
Date: Thu, 01 Dec 2022 12:25:13 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Wang ShaoBo <bobo.shaobowang@...wei.com>
Cc: liwei391@...wei.com, sameo@...ux.intel.com, kuba@...nel.org,
davem@...emloft.net, syzkaller-bugs@...glegroups.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfc: llcp: Fix race in handling llcp_devices
On Tue, 2022-11-29 at 17:44 +0800, Wang ShaoBo wrote:
> There are multiple path operate llcp_devices list without protection:
>
> CPU0 CPU1
>
> nfc_unregister_device() nfc_register_device()
> nfc_llcp_unregister_device() nfc_llcp_register_device() //no lock
> ... list_add(local->list, llcp_devices)
> local_release()
> list_del(local->list)
>
> CPU2
> ...
> nfc_llcp_find_local()
> list_for_each_entry(,&llcp_devices,)
>
> So reach race condition if two of the three occur simultaneously like
> following crash report, although there is no reproduction script in
> syzbot currently, our artificially constructed use cases can also
> reproduce it:
>
> list_del corruption. prev->next should be ffff888060ce7000, but was ffff88802a0ad000. (prev=ffffffff8e536240)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:59!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 16622 Comm: syz-executor.5 Not tainted 6.1.0-rc6-next-20221125-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
> RIP: 0010:__list_del_entry_valid.cold+0x12/0x72 lib/list_debug.c:59
> Code: f0 ff 0f 0b 48 89 f1 48 c7 c7 60 96 a6 8a 4c 89 e6 e8 4b 29 f0 ff 0f 0b 4c 89 e1 48 89 ee 48 c7 c7 c0 98 a6 8a e8 37 29 f0 ff <0f> 0b 48 89 ee 48 c7 c7 a0 97 a6 8a e8 26 29 f0 ff 0f 0b 4c 89 e2
> RSP: 0018:ffffc900151afd58 EFLAGS: 00010282
> RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000
> RDX: ffff88801e7eba80 RSI: ffffffff8166001c RDI: fffff52002a35f9d
> RBP: ffff888060ce7000 R08: 000000000000006d R09: 0000000000000000
> R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8e536240
> R13: ffff88801f3f3000 R14: ffff888060ce1000 R15: ffff888079d855f0
> FS: 0000555556f57400(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f095d5ad988 CR3: 000000002155a000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __list_del_entry include/linux/list.h:134 [inline]
> list_del include/linux/list.h:148 [inline]
> local_release net/nfc/llcp_core.c:171 [inline]
> kref_put include/linux/kref.h:65 [inline]
> nfc_llcp_local_put net/nfc/llcp_core.c:181 [inline]
> nfc_llcp_local_put net/nfc/llcp_core.c:176 [inline]
> nfc_llcp_unregister_device+0xb8/0x260 net/nfc/llcp_core.c:1619
> nfc_unregister_device+0x196/0x330 net/nfc/core.c:1179
> virtual_ncidev_close+0x52/0xb0 drivers/nfc/virtual_ncidev.c:163
> __fput+0x27c/0xa90 fs/file_table.c:320
> task_work_run+0x16f/0x270 kernel/task_work.c:179
> resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
> exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:203
> __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> This patch add specific mutex lock llcp_devices_list_lock to ensure
> handling llcp_devices list safety.
Why a mutex instead of a spinlock? all the critical sections are very
small (both code and time-wise), while the list of callers reaching
that code is quite large making hard to check each of them is really in
process context.
Please switch to a spinlock instead.
Cheers,
Paolo
Powered by blists - more mailing lists