[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 8 Aug 2023 11:35:02 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: "jasowang@...hat.com" <jasowang@...hat.com>, "mst@...hat.com"
<mst@...hat.com>
CC: "virtualization@...ts.linux-foundation.org"
<virtualization@...ts.linux-foundation.org>, "davem@...emloft.net"
<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "kuba@...nel.org"
<kuba@...nel.org>, "xuanzhuo@...ux.alibaba.com" <xuanzhuo@...ux.alibaba.com>,
"pabeni@...hat.com" <pabeni@...hat.com>
Subject: Re: [PATCH net] virtio-net: fix race between set queues and probe
On Tue, 2023-07-25 at 03:20 -0400, Jason Wang wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
>
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang <jasowang@...hat.com>
> ---
> drivers/net/virtio_net.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0db14f6b87d3..1270c8d23463 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (vi->has_rss || vi->has_rss_hash_report)
> virtnet_init_default_rss(vi);
>
> + _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +
> /* serialize netdev register + virtio_device_ready() with ndo_open()
> */
> rtnl_lock();
>
> @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> goto free_unregister_netdev;
> }
>
> - virtnet_set_queues(vi, vi->curr_queue_pairs);
> -
> /* Assume link up if device can't report link status,
> otherwise get link status from config. */
> netif_carrier_off(dev);
This change seems to break mlx5_vdpa when using virtio_vdpa.
_virtnet_set_queues() hangs. Because DRIVER_OK has not yet been set, mlx5_vdpa
cvq kick handler will ignore any commands.
Output:
[ 199.681445] mlx5_core 0000:08:00.0: E-Switch: Enable: mode(OFFLOADS),
nvfs(1), necvfs(0), active vports(2)
[ 199.690154] mlx5_core 0000:08:00.2: firmware version: 22.38.458
[ 199.862816] mlx5_core 0000:08:00.2: Rate limit: 127 rates are supported,
range: 0Mbps to 97656Mbps
[ 199.895653] mlx5_core 0000:08:00.2: Assigned random MAC address
b2:a1:71:c0:28:a2
[ 200.036900] mlx5_core 0000:08:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048)
RxCqeCmprss(0 enhanced)
[ 202.215022] mlx5_core 0000:08:00.2: mlx5_vdpa_reset:2813:(pid 1294):
performing device reset
[ 223.228309] rcu: INFO: rcu_sched self-detected stall on CPU
[ 223.228870] rcu: 3-....: (5250 ticks this GP)
idle=1cdc/1/0x4000000000000000 softirq=1981/1981 fqs=2064
[ 223.229686] rcu: (t=5251 jiffies g=7061 q=4365 ncpus=10)
[ 223.230165] CPU: 3 PID: 1294 Comm: modprobe Not tainted 6.5.0-rc4+ #32
[ 223.230725] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-
1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 223.231657] RIP: 0010:virtqueue_get_buf_ctx+0x7/0x290
[ 223.232141] Code: ...
[ 223.233639] RSP: 0018:ffff888144fe7898 EFLAGS: 00000246
[ 223.234108] RAX: 0000000000000000 RBX: ffff888103327940 RCX: ffff888100eca000
[ 223.234711] RDX: 0000000000000000 RSI: ffff888144fe78ac RDI: ffff888108c3b300
[ 223.235315] RBP: ffff888144fe78d0 R08: 0000000000000001 R09: ffff888103327940
[ 223.235921] R10: 0000000000000003 R11: 0000000000000008 R12: 0000000000000002
[ 223.236536] R13: 0000000000000004 R14: ffff8881037c1400 R15: ffff88810337f740
[ 223.237144] FS: 00007f55572f8740(0000) GS:ffff88846f980000(0000)
knlGS:0000000000000000
[ 223.237877] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 223.238401] CR2: 00007f5556d05c00 CR3: 000000012df3a002 CR4: 0000000000370ea0
[ 223.239016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 223.239630] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 223.240265] Call Trace:
[ 223.240570] <IRQ>
[ 223.240823] ? rcu_dump_cpu_stacks+0xc4/0x100
[ 223.241250] ? rcu_sched_clock_irq+0x407/0xaf0
[ 223.241677] ? trigger_load_balance+0x62/0x380
[ 223.242105] ? update_process_times+0x5b/0x90
[ 223.242527] ? tick_sched_timer+0x8b/0xa0
[ 223.242926] ? tick_sched_do_timer+0x80/0x80
[ 223.243346] ? __hrtimer_run_queues+0x121/0x270
[ 223.243777] ? hrtimer_interrupt+0xf4/0x230
[ 223.244193] ? __sysvec_apic_timer_interrupt+0x52/0xf0
[ 223.244684] ? sysvec_apic_timer_interrupt+0x69/0x90
[ 223.245150] </IRQ>
[ 223.245412] <TASK>
[ 223.245668] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 223.246160] ? virtqueue_get_buf_ctx+0x7/0x290
[ 223.246588] virtnet_send_command+0x113/0x170
[ 223.247008] ? virtnet_set_affinity+0x166/0x1b0
[ 223.247441] _virtnet_set_queues+0x9f/0x100
[ 223.247850] virtnet_probe+0xa27/0x1270
[ 223.248242] ? vdpa_get_config+0x5b/0x70 [vdpa]
[ 223.248699] ? virtio_vdpa_notify_with_data+0x30/0x30 [virtio_vdpa]
[ 223.249251] virtio_dev_probe+0x196/0x300
[ 223.249646] really_probe+0xc3/0x3d0
[ 223.250012] ? driver_probe_device+0x90/0x90
[ 223.250425] __driver_probe_device+0x80/0x160
[ 223.250846] driver_probe_device+0x1f/0x90
[ 223.251247] __device_attach_driver+0x7d/0x100
[ 223.251674] bus_for_each_drv+0x7d/0xd0
[ 223.252055] __device_attach+0xb2/0x1c0
[ 223.252455] bus_probe_device+0x86/0xa0
[ 223.252839] device_add+0x65b/0x870
[ 223.253200] ? mlx5_vdpa_set_status+0x2a/0x200 [mlx5_vdpa]
[ 223.253699] ? virtio_vdpa_notify_with_data+0x30/0x30 [virtio_vdpa]
[ 223.254255] register_virtio_device+0xb5/0xf0
[ 223.254678] virtio_vdpa_probe+0xae/0xf0 [virtio_vdpa]
[ 223.255156] really_probe+0xc3/0x3d0
[ 223.255518] ? __device_attach_driver+0x100/0x100
[ 223.255962] __driver_probe_device+0x80/0x160
[ 223.256409] driver_probe_device+0x1f/0x90
[ 223.256812] __driver_attach+0xe9/0x1b0
[ 223.257190] bus_for_each_dev+0x70/0xc0
[ 223.257574] bus_add_driver+0xf1/0x220
[ 223.257954] driver_register+0x55/0xf0
[ 223.258330] ? 0xffffffffa0647000
[ 223.258675] do_one_initcall+0x4c/0x1e0
[ 223.259060] ? kmalloc_trace+0x26/0x80
[ 223.259438] do_init_module+0x88/0x260
[ 223.259813] init_module_from_file+0x86/0xc0
[ 223.260245] __x64_sys_finit_module+0x16d/0x2e0
[ 223.260695] do_syscall_64+0x3d/0x90
[ 223.261063] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 223.261532] RIP: 0033:0x7f5556d2bd2d
[ 223.261894] Code: ...
[ 223.263415] RSP: 002b:00007ffc1ffcd908 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 223.264113] RAX: ffffffffffffffda RBX: 00005642fa82caf0 RCX: 00007f5556d2bd2d
[ 223.264736] RDX: 0000000000000000 RSI: 00005642f8ca3fc9 RDI: 0000000000000003
[ 223.265348] RBP: 00007ffc1ffcd9c0 R08: 0000000000000000 R09: 00007ffc1ffcd950
[ 223.265966] R10: 0000000000000003 R11: 0000000000000246 R12: 00005642f8ca3fc9
[ 223.266581] R13: 0000000000040000 R14: 00005642fa82cc20 R15: 0000000000000000
[ 223.267192] </TASK>
Thanks,
Dragos
Powered by blists - more mailing lists