[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220720093005.2unej4jnnvrn55f2@sgarzare-redhat>
Date: Wed, 20 Jul 2022 11:30:05 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Arseniy Krasnov <AVKrasnov@...rdevices.ru>
Cc: "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Krasnov Arseniy <oxffffaa@...il.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"virtualization@...ts.linux-foundation.org"
<virtualization@...ts.linux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
kernel <kernel@...rdevices.ru>
Subject: Re: [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set
POLLIN/POLLRDNORM
On Wed, Jul 20, 2022 at 06:07:47AM +0000, Arseniy Krasnov wrote:
>On 19.07.2022 15:58, Stefano Garzarella wrote:
>> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
>>> Hello,
>>>
>>> during my experiments with zerocopy receive, i found, that in some
>>> cases, poll() implementation violates POSIX: when socket has non-
>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>> POLLIN flag and then tries to read data(for example using 'read()'
>>> call), but read call will be blocked, because SO_RCVLOWAT logic is
>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX
>>> requires that:
>>>
>>> "POLLIN Data other than high-priority data may be read without
>>> blocking.
>>> POLLRDNORM Normal data may be read without blocking."
>>>
>>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>>
>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>> be blocked.
>>>
>>> Also in man page socket(7) i found that:
>>>
>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>
>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>> this case for TCP socket, it works as POSIX required.
>>
>> I tried to look at the code and it seems that only TCP complies with it or am I wrong?
>Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of
>course ignores SO_RCVLOWAT.
>>
>>>
>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>> test is also implemented.
>>>
>>> What do You think guys?
>>
>> Nice, thanks for fixing this and for the test!
>>
>> I left some comments, but I think the series is fine if we will support it in all transports.
>Ack
>>
>> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse.
>Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks
>>
>> @David, @Jakub, @Paolo, any advice?
>>
>> Thanks,
>> Stefano
>>
>
>PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter
>only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be
>woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data
>is too small).
>See tcp_data_ready() in net/ipv4/tcp_input.c
Do you mean that we should call sk->sk_data_ready(sk) checking
SO_RCVLOWAT?
It seems fine, maybe we can add vsock_data_ready() in af_vsock.c that
transports should call instead of calling sk->sk_data_ready(sk)
directly.
Then we can something similar to tcp_data_ready().
Thanks,
Stefano
Powered by blists - more mailing lists