[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYYTder4-zvgOfV7@sgarzare-redhat>
Date: Fri, 6 Feb 2026 17:23:52 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: agpn1b92@...naddy.me
Cc: virtualization@...ts.linux.dev, netdev@...r.kernel.org
Subject: Re: [BUG] vsock: poll() not waking on data arrival, causing
multi-second SSH delays
On Thu, Feb 05, 2026 at 07:53:33AM +0000, agpn1b92@...naddy.me wrote:
>Hi,
Hi [Your Name],
>
>I'm experiencing a bug where SSH sessions over vsock take 2-20+ seconds
>to establish due to poll() not signaling POLLIN when data is available.
>The bug does NOT occur on the first connection after VM boot, but affects
>all subsequent connections.
>
>* Summary
>
>- vsock poll() fails to return POLLIN when data is in the receive buffer
>- sshd-session's ppoll() times out every ~20ms instead of waking on data
>- First SSH connection after guest boot works instantly
>- All subsequent connections experience 2-20+ second delays
>- Non-PTY commands (ssh -T ... 'echo test') work instantly
mm, so not sure if it's related to the kernel or the user space proxy,
etc. Would be nice to replicate without ssh.
I tried with 6.18 on both guest and host and I'm not able to reproduce
it.
Can you try to write a simple reproducer without ssh involved?
Thanks,
Stefano
>- TCP connections to the same VM work instantly
>
>* Environment
>
>Host:
>- OS: Arch Linux
>- Kernel: 6.18.2-arch2-1
>- QEMU: system package (latest)
>
>Guest:
>- OS: Debian trixie
>- Kernel: 6.17.13+deb13-amd64 (also tested on 6.12.57, same issue)
>- OpenSSH: 10.0p2
>
>QEMU command (relevant parts):
> qemu-system-x86_64 -enable-kvm -smp 8 \
> -object memory-backend-memfd,id=mem,size=20G,share=on \
> -machine memory-backend=mem \
> -device vhost-vsock-pci,guest-cid=5 \
> ...
>
>Connection method: ssh user@...ck/5 (via systemd-ssh-proxy)
>
>* Symptoms
>
>Interactive SSH (PTY) - SLOW:
> $ time ssh user@...ck/5
> # Takes 2-20+ seconds before shell prompt appears
>
>Non-interactive SSH - FAST:
> $ time ssh user@...ck/5 'echo test'
> test
> real 0m0.156s
>
>TCP to same VM - FAST:
> $ time ssh -p 33594 user@....0.0.1
> # Instant
>
>* Key observation: First connection after boot is fast
>
>After guest reboot:
> $ ssh user@...ck/5 # INSTANT (< 1 second)
> $ exit
> $ ssh user@...ck/5 # SLOW (2-20 seconds)
> $ ssh user@...ck/5 # SLOW
> ...
>
>This suggests the bug involves state that accumulates or isn't properly
>cleaned up between connections.
>
>** bpftrace evidence
>
>Using syscall tracepoints on guest during slow connection:
>
> === MINIMAL VSOCK DIAGNOSTIC ===
> [ 29 ms] sshd-session: ppoll() duration=19 ms ret=1
> ^^^ 20ms TIMEOUT pattern detected!
> [ 50 ms] sshd-session: ppoll() duration=20 ms ret=1
> ^^^ 20ms TIMEOUT pattern detected!
> [ 70 ms] sshd-session: ppoll() duration=18 ms ret=1
> ^^^ 20ms TIMEOUT pattern detected!
> ... (continues for ~2 seconds) ...
>
> [ 5000 ms] --- 5s stats: ppoll=455, timeouts=103, recv=0 (0 bytes) ---
>
> [19432 ms] sshd: recvmsg() = 308 bytes [4 µs]
> [19442 ms] sshd-session: recvmsg() = 308 bytes [4 µs]
>
>Pattern analysis:
>- ppoll() returns ret=1 (1 fd ready) but takes exactly ~20ms (timeout)
>- The ready fd is the PTY, NOT the vsock socket
>- recv=0 during the timeout phase: vsock data not being read
>- recvmsg() finally succeeds after ~19 seconds
>- When recvmsg() runs, it completes in 4 microseconds (data WAS there)
>
>This proves: data is sitting in the vsock receive buffer, but poll()
>is not returning POLLIN, so sshd doesn't know to read it.
>
>* 30-second summary from bpftrace
>
> Total ppoll calls: 488
> Timeouts (20ms pattern): 103
> Successful recvmsg: 6 (984 bytes)
> Timeout rate: 21%
>
>* Why PTY-specific?
>
>PTY sessions require bidirectional traffic:
>1. Server sends shell prompt → client must receive it
>2. Client sends keypress → server must receive it
>3. Server sends echo → client must receive it
>
>Each exchange relies on poll() waking on POLLIN. The bug causes poll()
>to miss the wakeup, forcing sshd to wait for its 20ms timeout fallback.
>
>Non-PTY commands do request-response-exit quickly before the bug
>manifests significantly.
>
>## Additional context
>
>I previously encountered the identical issue on WSL2's Hyper-V vsock
>implementation, suggesting this may be a fundamental issue with how
>vsock transports handle poll/wakeup semantics, not specific to virtio.
>
>## Hypothesis
>
>Based on the evidence, this appears to be a lost wakeup race condition:
>1. Host sends packet to guest
>2. Packet is enqueued to socket's rx_queue
>3. sk_data_ready() is called but poll waiters aren't properly woken
>4. vsock_poll() returns 0 (no POLLIN) despite data being available
>5. ppoll() times out after 20ms, sshd retries
>6. Eventually succeeds through timeout-based retry
>
>The "first connection works" pattern suggests the race involves
>existing state from previous connections - possibly worker threads,
>interrupt handlers, or virtqueue state that isn't properly reset.
>
>## Reproducer
>
>1. Start QEMU VM with vhost-vsock-pci device
>2. Boot guest, ensure sshd is running
>3. From host: ssh user@...ck/<CID> # First connection is fast
>4. Exit and reconnect: ssh user@...ck/<CID> # Now slow
>
>## Request
>
>Could someone familiar with the vsock/virtio poll implementation
>review the wakeup path? Specifically:
>- virtio_transport_recv_pkt() -> sk_data_ready() path
>- vsock_poll() -> poll_wait() registration timing
>- Any state that persists between connections
>
>Happy to provide additional traces or test patches.
>
>Thanks,
>[Your Name]
>
>---
>bpftrace script used (runs on guest):
>
>#!/usr/bin/env bpftrace
>BEGIN {
> @start = nsecs;
> printf("=== MINIMAL VSOCK DIAGNOSTIC ===\n");
>}
>tracepoint:syscalls:sys_enter_ppoll {
> if (comm == "sshd-session" || comm == "sshd") {
> @ppoll_enter[tid] = nsecs;
> @ppoll_count++;
> }
>}
>tracepoint:syscalls:sys_exit_ppoll {
> if (@ppoll_enter[tid]) {
> $ms = (nsecs - @start) / 1000000;
> $dur = (nsecs - @ppoll_enter[tid]) / 1000000;
> if ($dur > 10) {
> printf("[%5lld ms] %s: ppoll() duration=%lld ms ret=%d\n",
> $ms, comm, $dur, args->ret);
> if ($dur >= 18 && $dur <= 25) {
> printf(" ^^^ 20ms TIMEOUT pattern detected!\n");
> @timeout_count++;
> }
> }
> delete(@ppoll_enter[tid]);
> }
>}
>tracepoint:syscalls:sys_exit_recvmsg {
> if (comm == "sshd-session" || comm == "sshd") {
> if (args->ret > 0) {
> $ms = (nsecs - @start) / 1000000;
> printf("[%5lld ms] %s: recvmsg() = %lld bytes\n", $ms, comm, args-
>>ret);
> @recv_count++;
> @recv_bytes += args->ret;
> }
> }
>}
>interval:s:5 {
> printf("\n[%5lld ms] --- 5s stats: ppoll=%d, timeouts=%d, recv=%d (%d bytes)
>---\n\n",
> (nsecs - @start) / 1000000, @ppoll_count, @timeout_count,
>@recv_count, @recv_bytes);
>}
>
>
>
>
Powered by blists - more mailing lists