[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <SA1PR21MB13358086D8E23229FCE692ABBF00A@SA1PR21MB1335.namprd21.prod.outlook.com>
Date: Wed, 26 Jul 2023 21:34:32 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: Stefano Garzarella <sgarzare@...hat.com>,
Gary Guo <gary@...yguo.net>
CC: KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"virtualization@...ts.linux-foundation.org"
<virtualization@...ts.linux-foundation.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Nischala Yelchuri <Nischala.Yelchuri@...rosoft.com>
Subject: RE: Hyper-V vsock streams do not fill the supplied buffer in full
> -----Original Message-----
> From: Stefano Garzarella <sgarzare@...hat.com>
> Sent: Thursday, July 6, 2023 3:02 AM
> To: Gary Guo <gary@...yguo.net>; Dexuan Cui <decui@...rosoft.com>
> Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; Wei Liu <wei.liu@...nel.org>; linux-
> hyperv@...r.kernel.org; virtualization@...ts.linux-foundation.org;
> netdev@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: Re: Hyper-V vsock streams do not fill the supplied buffer in full
>
> Hi Gary,
>
> On Wed, Jul 5, 2023 at 12:45 AM Gary Guo <gary@...yguo.net> wrote:
> >
> > When a vsock stream is called with recvmsg with a buffer, it only fills
> > the buffer with data from the first single VM packet. Even if there are
> > more VM packets at the time and the buffer is still not completely
> > filled, it will just leave the buffer partially filled.
> >
> > This causes some issues when in WSLD which uses the vsock in
> > non-blocking mode and uses epoll.
> >
> > For stream-oriented sockets, the epoll man page [1] says that
> >
> > > For stream-oriented files (e.g., pipe, FIFO, stream socket),
> > > the condition that the read/write I/O space is exhausted can
> > > also be detected by checking the amount of data read from /
> > > written to the target file descriptor. For example, if you
> > > call read(2) by asking to read a certain amount of data and
> > > read(2) returns a lower number of bytes, you can be sure of
> > > having exhausted the read I/O space for the file descriptor.
> >
> > This has been used as an optimisation in the wild for reducing number
> > of syscalls required for stream sockets (by asserting that the socket
> > will not have to polled to EAGAIN in edge-trigger mode, if the buffer
> > given to recvmsg is not filled completely). An example is Tokio, which
> > starting in v1.21.0 [2].
> >
> > When this optimisation combines with the behaviour of Hyper-V vsock, it
> > causes issue in this scenario:
> > * the VM host send data to the guest, and it's splitted into multiple
> > VM packets
> > * sk_data_ready is called and epoll returns, notifying the userspace
> > that the socket is ready
> > * userspace call recvmsg with a buffer, and it's partially filled
> > * userspace assumes that the stream socket is depleted, and if new data
> > arrives epoll will notify it again.
> > * kernel always considers the socket to be ready, and since it's in
> > edge-trigger mode, the epoll instance will never be notified again.
> >
> > This different realisation of the readiness causes the userspace to
> > block forever.
>
> Thanks for the detailed description of the problem.
>
> I think we should fix the hvs_stream_dequeue() in
> net/vmw_vsock/hyperv_transport.c.
> We can do something similar to what we do in
> virtio_transport_stream_do_dequeue() in
> net/vmw_vsock/virtio_transport_common.c
>
> @Dexuan WDYT?
>
> Thanks,
> Stefano
(Sorry for the late response...)
Thanks Gary Guo for the good analysis!
I didn't realize that hvs_stream_dequeue() is supposed to
copy as much data as possible to the userspace in the case
of EPOLLET mode.
Yes, I think we should fix hvs_stream_dequeue(). We'll try to get
this fixed asap.
Thanks,
-- Dexuan
Powered by blists - more mailing lists