[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <154EA998-3FBB-41E9-B07E-4841B027B1B5@nutanix.com>
Date: Wed, 10 Sep 2025 18:58:18 +0000
From: Jon Kohler <jon@...anix.com>
To: "patchwork-bot+netdevbpf@...nel.org" <patchwork-bot+netdevbpf@...nel.org>
CC: "mst@...hat.com" <mst@...hat.com>,
"jasowang@...hat.com"
<jasowang@...hat.com>,
"eperezma@...hat.com" <eperezma@...hat.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"virtualization@...ts.linux.dev"
<virtualization@...ts.linux.dev>,
"netdev@...r.kernel.org"
<netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: vhost_iotlb_miss tight loop lockup - RE vhost/net: Defer TX queue
re-enable until after sendmsg
> On May 5, 2025, at 9:40 PM, patchwork-bot+netdevbpf@...nel.org wrote:
>
> Hello:
>
> This patch was applied to netdev/net-next.git (main)
> by Jakub Kicinski <kuba@...nel.org>:
Hey all,
Writing to fire up a flare and point out a problem that we’re seeing
with this patch internally, specifically when we enable iommu on the
virtio-net device.
With this patch applied on 6.12.y-based bare metal instance and then
starting a 6.12.y based guest with iommu enabled, we see lockups
within the guest in short order, as well as vmm (qemu) stuck in a tight
loop responding to iommu misses from vhost net loop.
We've bisected this in our internal tree, and for sure it is this
patch that is alledgedly causing the problem, so I wanted to point out
there is some sort of issue here.
Working on trying to figure this out, but if jumps off the page to
anyone, happy to take advice!
Flamegraph:
https://gist.github.com/JonKohler/0e83c014230ab59ddc950f10441335f1#file-iotlb-lockup-svg
Guest dmesg errors like so:
[ 66.081694] virtio_net virtio0 eth0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5500 ms
[ 68.145155] virtio_net virtio0 eth0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 7560000 usecs ago
[ 112.907012] virtio_net virtio0 eth0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5568 ms
[ 124.117540] virtio_net virtio0 eth0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 16776000 usecs ago
[ 124.118050] virtio_net virtio0 eth0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 16776 ms
[ 124.118447] virtio_net virtio0 eth0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 16776000 usecs ago
Host level top output
3992758 qemu 20 0 16.6g 52168 26704 R 99.9 0.0 21:23.72 qemu-kvm <<< this is the qemu main thread
3992769 qemu 20 0 16.6g 52168 26704 R 58.8 0.0 13:33.44 vhost-3992758 <<< this is the vhost-net kthread
For qemu-kvm main thread:
Samples: 13K of event 'cycles:P', 4000 Hz, Event count (approx.): 5131922583 lost: 0/0 drop: 0/0
Children Self Shared Object Symbol
- 87.41% 0.30% [kernel] [k] entry_SYSCALL_64_after_hwframe
- 87.11% entry_SYSCALL_64_after_hwframe
- do_syscall_64
- 44.79% ksys_write
- 43.74% vfs_write
- 40.96% vhost_chr_write_iter
- 38.22% vhost_process_iotlb_msg
- 13.72% vhost_iotlb_add_range_ctx
- 7.43% vhost_iotlb_map_free
- 4.37% vhost_iotlb_itree_remove
rb_next
1.78% __rb_erase_color
0.73% kfree
1.15% __rb_insert_augmented
0.68% __kmalloc_cache_noprof
- 10.73% vhost_vq_work_queue
- 7.65% try_to_wake_up
- 2.55% ttwu_queue_wakelist
- 1.72% __smp_call_single_queue
1.36% call_function_single_prep_ipi
- 1.32% __task_rq_lock
- _raw_spin_lock
native_queued_spin_lock_slowpath
- 1.30% select_task_rq
- select_task_rq_fair
- 0.88% wake_affine
available_idle_cpu
2.06% llist_add_batch
- 4.05% __mutex_lock.constprop.0
2.14% mutex_spin_on_owner
0.72% osq_lock
3.00% mutex_lock
- 1.72% kfree
- 1.16% __slab_free
slab_update_freelist.constprop.0.isra.0
1.37% _raw_spin_lock
1.08% mutex_unlock
1.98% _copy_from_iter
- 1.86% rw_verify_area
- security_file_permission
- 1.13% file_has_perm
0.69% avc_has_perm
0.63% fdget_pos
- 27.86% syscall_exit_to_user_mode
- syscall_exit_to_user_mode_prepare
- 25.96% __audit_syscall_exit
- 25.03% __audit_filter_op
6.66% audit_filter_rules.constprop.0
1.27% audit_reset_context.part.0.constprop.0
- 10.86% ksys_read
- 9.37% vfs_read
- 6.67% vhost_chr_read_iter
1.48% _copy_to_iter
1.36% _raw_spin_lock
- 1.30% __wake_up
0.81% _raw_spin_lock_irqsave
- 1.25% vhost_enqueue_msg
_raw_spin_lock
- 1.83% rw_verify_area
- security_file_permission
- 1.03% file_has_perm
0.64% avc_has_perm
0.65% fdget_pos
0.57% fput
- 2.56% syscall_trace_enter
- 1.25% __seccomp_filter
seccomp_run_filters
0.54% __audit_syscall_entry
vhost-net thread
Samples: 20K of event 'cycles:P', 4000 Hz, Event count (approx.): 7796456297 lost: 0/0 drop: 0/0
Children Self Shared Object Symbol
- 100.00% 3.38% [kernel] [k] vhost_task_fn
38.26% 0xffffffff930bb8c0
- 3.36% 0
ret_from_fork_asm
ret_from_fork
- 1.16% vhost_task_fn
- 2.35% vhost_run_work_list
- 1.67% handle_tx
- 7.09% __mutex_lock.constprop.0
6.64% mutex_spin_on_owner
- 0.84% vq_meta_prefetch
- 3.22% iotlb_access_ok
2.50% vhost_iotlb_itree_first
0.80% mutex_lock
- 0.75% handle_tx_copy
0.86% llist_reverse_order
>
> On Wed, 30 Apr 2025 19:04:28 -0700 you wrote:
>> In handle_tx_copy, TX batching processes packets below ~PAGE_SIZE and
>> batches up to 64 messages before calling sock->sendmsg.
>>
>> Currently, when there are no more messages on the ring to dequeue,
>> handle_tx_copy re-enables kicks on the ring *before* firing off the
>> batch sendmsg. However, sock->sendmsg incurs a non-zero delay,
>> especially if it needs to wake up a thread (e.g., another vhost worker).
>>
>> [...]
>
> Here is the summary with links:
> - [net-next,v3] vhost/net: Defer TX queue re-enable until after sendmsg
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_netdev_net-2Dnext_c_8c2e6b26ffe2&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=0XoR6N9VbkaJ_wBENy8Z28uDdqjCe4HRNCyV-8o4etqXeEJOqoFFGjeGGP5sQcmt&s=-X8si_rU8pXKNyWNNzBqx5Fmv-ut9w2gS5E6coMDApM&e=
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__korg.docs.kernel.org_patchwork_pwbot.html&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=0XoR6N9VbkaJ_wBENy8Z28uDdqjCe4HRNCyV-8o4etqXeEJOqoFFGjeGGP5sQcmt&s=sydedZsBCMSJM9_Ldw6Al-BplvM7FokLwV_80bJpGnM&e=
>
>
Powered by blists - more mailing lists