[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <833d4550f9f6436ac8d61919e05272ba@anonaddy.me>
Date: Sat, 07 Feb 2026 18:56:49 +0000
From: agpn1b92@...naddy.me
To: sgarzare@...hat.com
Cc: virtualization@...ts.linux.dev, netdev@...r.kernel.org
Subject: Re: [BUG] vsock: poll() not waking on data arrival, causing
multi-second SSH delays
Hi Stefano and all,
Thank you Stefano for your response and skepticism about whether this was
a kernel issue - you were absolutely right to question it!
After extensive debugging with strace on both guest and host, I've
determined this was NOT a kernel bug at all, but rather an OpenSSH issue
specific to vsock connections.
Root Cause:
-----------
The 10-20 second delay was caused by OpenSSH's sshd attempting DNS lookups
on the literal string "UNKNOWN" (the placeholder hostname used for vsock
connections where no IP address exists). This triggered two 5-second DNS
timeouts during login recording and audit subsystem operations, totaling
~10 seconds of delay.
The strace showed:
17:11:14.465 sendmmsg(13, DNS query for "UNKNOWN")
17:11:14.465 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout)
<5.005s>
17:11:19.472 sendmmsg(13, DNS query for "UNKNOWN") [RETRY]
17:11:19.472 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout)
<5.005s>
Why I Initially Thought It Was a Kernel Issue:
----------------------------------------------
- bpftrace showed ppoll() timeouts while data appeared to be queued
- The pattern looked like a classic lost wakeup race condition
However, the vsock kernel modules were working perfectly. The delay
happened in userspace during sshd's session setup, specifically when
mm_record_login() tried to resolve the peer hostname for logging.
The Fix:
--------
OpenSSH 10.1 and 10.2 include fixes to prevent passing "UNKNOWN" to
subsystems that would attempt DNS resolution:
- 10.1: Skip audit logging for UNKNOWN hostnames
- 10.2: Don't set PAM_RHOST when remote host is "UNKNOWN"
References:
- https://github.com/openssh/openssh-portable/pull/388
-
https://gitlab.archlinux.org/archlinux/packaging/packages/openssh/-/issues/16
- https://www.openssh.org/releasenotes.html
Workaround for older OpenSSH versions:
Add to /etc/hosts: 127.0.0.1 UNKNOWN
Apologies for the noise on netdev - the vsock kernel implementation is
working correctly. The misleading symptoms (PTY-specific, ppoll timeouts,
state between connections) made it appear kernel-related when it was
actually sshd's login recording code hitting DNS timeouts.
Thanks again for your help and for maintaining the vsock subsystem!
Best regards,
[Your name - don't forget to update it this time or you'll look even
more stupid]
Powered by blists - more mailing lists