[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9259d368c091b071d16bd1969240f4e9dffe92fb.camel@redhat.com>
Date: Thu, 01 Feb 2024 19:49:23 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>, "David S . Miller"
<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, eric.dumazet@...il.com, syzbot
<syzkaller@...glegroups.com>, Jiri Pirko <jiri@...dia.com>
Subject: Re: [PATCH net] netdevsim: avoid potential loop in
nsim_dev_trap_report_work()
On Thu, 2024-02-01 at 17:53 +0000, Eric Dumazet wrote:
> Many syzbot reports include the following trace [1]
>
> If nsim_dev_trap_report_work() can not grab the mutex,
> it should rearm itself at least one jiffie later.
>
> [1]
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> Workqueue: events nsim_dev_trap_report_work
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
> Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
> RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
> RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
> RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
> RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
> R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
> R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
> FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <NMI>
> </NMI>
> <TASK>
> instrument_atomic_read include/linux/instrumented.h:68 [inline]
> atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
> queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
> do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
> __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
> _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
> debug_object_activate+0x349/0x540 lib/debugobjects.c:726
> debug_work_activate kernel/workqueue.c:578 [inline]
> insert_work+0x30/0x230 kernel/workqueue.c:1650
> __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
> __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
> queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
> queue_delayed_work include/linux/workqueue.h:563 [inline]
> schedule_delayed_work include/linux/workqueue.h:677 [inline]
> nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
> Fixes: 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
> Reported-by: syzbot <syzkaller@...glegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: Jiri Pirko <jiri@...dia.com>
> ---
> drivers/net/netdevsim/dev.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index b4d3b9cde8bd685202f135cf9c845d1be76ef428..92a7a36b93ac0cc1b02a551b974fb390254ac484 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -835,14 +835,14 @@ static void nsim_dev_trap_report_work(struct work_struct *work)
> trap_report_dw.work);
> nsim_dev = nsim_trap_data->nsim_dev;
>
> - /* For each running port and enabled packet trap, generate a UDP
> - * packet with a random 5-tuple and report it.
> - */
> if (!devl_trylock(priv_to_devlink(nsim_dev))) {
> - schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 0);
> + schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 1);
The patch LGTM, thanks!
I'm wondering if we have a similar problem in
devlink_rel_nested_in_notify_work():
if (!devl_trylock(devlink)) {
devlink_put(devlink);
goto reschedule_work;
}
//...
reschedule_work:
schedule_work(&rel->nested_in.notify_work);
And possibly adding 1ms delay there could be problematic?
Cheers,
Paolo
Powered by blists - more mailing lists