[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ef2c01b5-d38b-4409-bbd4-0484564657c9@linux.dev>
Date: Thu, 20 Jun 2024 17:05:25 +0800
From: Zhu Yanjun <yanjun.zhu@...ux.dev>
To: Leon Romanovsky <leon@...nel.org>
Cc: syzbot <syzbot+19ec7595e3aa1a45f623@...kaller.appspotmail.com>,
jgg@...pe.ca, linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
netdev@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [rdma?] WARNING in ib_uverbs_release_dev
在 2024/6/20 1:48, Leon Romanovsky 写道:
> On Wed, Jun 19, 2024 at 10:16:20PM +0800, Zhu Yanjun wrote:
>> 在 2024/6/19 17:15, Leon Romanovsky 写道:
>>> On Tue, Jun 18, 2024 at 11:37:18PM -0700, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit: 2ccbdf43d5e7 Merge tag 'for-linus' of git://git.kernel.org..
>>>> git tree: upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=179e93fe980000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=fa0ce06dcc735711
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=19ec7595e3aa1a45f623
>>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>>>
>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>
>>>> Downloadable assets:
>>>> disk image: https://storage.googleapis.com/syzbot-assets/27e64d7472ce/disk-2ccbdf43.raw.xz
>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/e1c494bb5c9c/vmlinux-2ccbdf43.xz
>>>> kernel image: https://storage.googleapis.com/syzbot-assets/752498985a5e/bzImage-2ccbdf43.xz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+19ec7595e3aa1a45f623@...kaller.appspotmail.com
>>>>
>>>> smc: removing ib device syz0
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 0 PID: 51 at kernel/rcu/srcutree.c:653 cleanup_srcu_struct+0x404/0x4d0 kernel/rcu/srcutree.c:653
>>>> Modules linked in:
>>>> CPU: 0 PID: 51 Comm: kworker/u8:3 Not tainted 6.10.0-rc3-syzkaller-00044-g2ccbdf43d5e7 #0
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
>>>> Workqueue: ib-unreg-wq ib_unregister_work
>>>> RIP: 0010:cleanup_srcu_struct+0x404/0x4d0 kernel/rcu/srcutree.c:653
>>>> Code: 12 80 00 48 c7 03 00 00 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e 41 5f 5d e9 14 67 34 0a 90 0f 0b 90 eb e7 90 0f 0b 90 eb e1 90 <0f> 0b 90 eb db 90 0f 0b 90 eb 0a 90 0f 0b 90 eb 04 90 0f 0b 90 48
>>>> RSP: 0018:ffffc90000bb7970 EFLAGS: 00010202
>>>> RAX: 0000000000000001 RBX: ffff88802a1bc980 RCX: 0000000000000002
>>>> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffe8ffffd74c58
>>>> RBP: 0000000000000001 R08: ffffe8ffffd74c5f R09: 1ffffd1ffffae98b
>>>> R10: dffffc0000000000 R11: fffff91ffffae98c R12: dffffc0000000000
>>>> R13: ffff88802285b5f0 R14: ffff88802285b000 R15: ffff88802a1bc800
>>>> FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 00007fa3852cae10 CR3: 000000000e132000 CR4: 0000000000350ef0
>>>> Call Trace:
>>>> <TASK>
>>>> ib_uverbs_release_dev+0x4e/0x80 drivers/infiniband/core/uverbs_main.c:136
>>>> device_release+0x9b/0x1c0
>>>> kobject_cleanup lib/kobject.c:689 [inline]
>>>> kobject_release lib/kobject.c:720 [inline]
>>>> kref_put include/linux/kref.h:65 [inline]
>>>> kobject_put+0x231/0x480 lib/kobject.c:737
>>>> remove_client_context+0xb9/0x1e0 drivers/infiniband/core/device.c:776
>>>> disable_device+0x13b/0x360 drivers/infiniband/core/device.c:1282
>>>> __ib_unregister_device+0x6d/0x170 drivers/infiniband/core/device.c:1475
>>>> ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1586
>>>> process_one_work kernel/workqueue.c:3231 [inline]
>>>> process_scheduled_works+0xa2e/0x1830 kernel/workqueue.c:3312
>>>> worker_thread+0x86d/0xd70 kernel/workqueue.c:3393
>>>> kthread+0x2f2/0x390 kernel/kthread.c:389
>>>> ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
>>>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>>>> </TASK>
>>>
>>> I see that this is caused by call to ib_unregister_device_queued() as a
>>> response to NETDEV_UNREGISTER event, but we don't flush anything before.
>>> How can we be sure that ib_device is not used anymore?
>>
>> Hi, Leon
>>
>> This is the console output:
>>
>> https://syzkaller.appspot.com/x/log.txt?x=179e93fe980000
>>
>> From the above link, it seems that other devices or subsystems failed
>> firstly, then caused this call trace to appear. When other problem occurred,
>> the whole kernel system was in mess state.So it is not weird that some
>> problems occurred.
>
> Which devices/subsystems failed? I grepped the log and don't see
> anything suspicious, before first "------------[ cut here ]------------"
> sentence.
Need the script to check this problem. It is an interesting problem.
Zhu Yanjun
>
>>
>> To be simple, the root cause is not in RDMA subsystem.
>>
>> I will continue to delve into this problem.
>>
>> Zhu Yanjun
>>>
>>> Thanks
>>
Powered by blists - more mailing lists