lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230704234532.532c8ee7.gary@garyguo.net>
Date:   Tue, 4 Jul 2023 23:45:32 +0100
From:   Gary Guo <gary@...yguo.net>
To:     "K. Y. Srinivasan" <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
        Stefano Garzarella <sgarzare@...hat.com>
Cc:     linux-hyperv@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Hyper-V vsock streams do not fill the supplied buffer in full

When a vsock stream is called with recvmsg with a buffer, it only fills
the buffer with data from the first single VM packet. Even if there are
more VM packets at the time and the buffer is still not completely
filled, it will just leave the buffer partially filled.

This causes some issues when in WSLD which uses the vsock in
non-blocking mode and uses epoll.

For stream-oriented sockets, the epoll man page [1] says that

> For stream-oriented files (e.g., pipe, FIFO, stream socket),
> the condition that the read/write I/O space is exhausted can
> also be detected by checking the amount of data read from /
> written to the target file descriptor.  For example, if you
> call read(2) by asking to read a certain amount of data and
> read(2) returns a lower number of bytes, you can be sure of
> having exhausted the read I/O space for the file descriptor.

This has been used as an optimisation in the wild for reducing number
of syscalls required for stream sockets (by asserting that the socket
will not have to polled to EAGAIN in edge-trigger mode, if the buffer
given to recvmsg is not filled completely). An example is Tokio, which
starting in v1.21.0 [2].

When this optimisation combines with the behaviour of Hyper-V vsock, it
causes issue in this scenario:
* the VM host send data to the guest, and it's splitted into multiple
  VM packets
* sk_data_ready is called and epoll returns, notifying the userspace
  that the socket is ready
* userspace call recvmsg with a buffer, and it's partially filled
* userspace assumes that the stream socket is depleted, and if new data
  arrives epoll will notify it again.
* kernel always considers the socket to be ready, and since it's in
  edge-trigger mode, the epoll instance will never be notified again.

This different realisation of the readiness causes the userspace to
block forever.

[0] https://github.com/nbdd0121/wsld/issues/32
[1] https://man7.org/linux/man-pages/man7/epoll.7.html#:~:text=For%20stream%2Doriented%20files
[2] https://github.com/tokio-rs/tokio/pull/4840

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ