[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d86d4ca3-5d3e-461f-ac8a-9d8715413dcf@redhat.com>
Date: Thu, 10 Jul 2025 10:30:49 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Zigit Zo <zuozhijie@...edance.com>, mst@...hat.com, jasowang@...hat.com,
xuanzhuo@...ux.alibaba.com, eperezma@...hat.com
Cc: virtualization@...ts.linux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net v2] virtio-net: fix a rtnl_lock() deadlock during
probing
On 7/2/25 12:37 PM, Zigit Zo wrote:
> This bug happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while
> the virtio-net driver is still probing with rtnl_lock() hold, this will
> cause a recursive mutex in netdev_notify_peers().
>
> Fix it by temporarily save the announce status while probing, and then in
> virtnet_open(), if it sees a delayed announce work is there, it starts to
> schedule the virtnet_config_changed_work().
>
> Another possible solution is to directly check whether rtnl_is_locked()
> and call __netdev_notify_peers(), but in that way means we need to relies
> on netdev_queue to schedule the arp packets after ndo_open(), which we
> thought is not very intuitive.
>
> We've observed a softlockup with Ubuntu 24.04, and can be reproduced with
> QEMU sending the announce_self rapidly while booting.
>
> [ 494.167473] INFO: task swapper/0:1 blocked for more than 368 seconds.
> [ 494.167667] Not tainted 6.8.0-57-generic #59-Ubuntu
> [ 494.167810] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 494.168015] task:swapper/0 state:D stack:0 pid:1 tgid:1 ppid:0 flags:0x00004000
> [ 494.168260] Call Trace:
> [ 494.168329] <TASK>
> [ 494.168389] __schedule+0x27c/0x6b0
> [ 494.168495] schedule+0x33/0x110
> [ 494.168585] schedule_preempt_disabled+0x15/0x30
> [ 494.168709] __mutex_lock.constprop.0+0x42f/0x740
> [ 494.168835] __mutex_lock_slowpath+0x13/0x20
> [ 494.168949] mutex_lock+0x3c/0x50
> [ 494.169039] rtnl_lock+0x15/0x20
> [ 494.169128] netdev_notify_peers+0x12/0x30
> [ 494.169240] virtnet_config_changed_work+0x152/0x1a0
> [ 494.169377] virtnet_probe+0xa48/0xe00
> [ 494.169484] ? vp_get+0x4d/0x100
> [ 494.169574] virtio_dev_probe+0x1e9/0x310
> [ 494.169682] really_probe+0x1c7/0x410
> [ 494.169783] __driver_probe_device+0x8c/0x180
> [ 494.169901] driver_probe_device+0x24/0xd0
> [ 494.170011] __driver_attach+0x10b/0x210
> [ 494.170117] ? __pfx___driver_attach+0x10/0x10
> [ 494.170237] bus_for_each_dev+0x8d/0xf0
> [ 494.170341] driver_attach+0x1e/0x30
> [ 494.170440] bus_add_driver+0x14e/0x290
> [ 494.170548] driver_register+0x5e/0x130
> [ 494.170651] ? __pfx_virtio_net_driver_init+0x10/0x10
> [ 494.170788] register_virtio_driver+0x20/0x40
> [ 494.170905] virtio_net_driver_init+0x97/0xb0
> [ 494.171022] do_one_initcall+0x5e/0x340
> [ 494.171128] do_initcalls+0x107/0x230
> [ 494.171228] ? __pfx_kernel_init+0x10/0x10
> [ 494.171340] kernel_init_freeable+0x134/0x210
> [ 494.171462] kernel_init+0x1b/0x200
> [ 494.171560] ret_from_fork+0x47/0x70
> [ 494.171659] ? __pfx_kernel_init+0x10/0x10
> [ 494.171769] ret_from_fork_asm+0x1b/0x30
> [ 494.171875] </TASK>
>
> Fixes: df28de7b0050 ("virtio-net: synchronize operstate with admin state on up/down")
> Signed-off-by: Zigit Zo <zuozhijie@...edance.com>
@Micheal: I think this addresses your concerns on v1, WDYT?
/P
Powered by blists - more mailing lists