[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <h63t6heovmyafu2lo6x6rzsbdbrhqhlbuol774ngbgshbycgdu@fgynzbmj5zn7>
Date: Mon, 4 Sep 2023 10:21:51 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Arseniy Krasnov <avkrasnov@...utedevices.com>
Cc: Stefan Hajnoczi <stefanha@...hat.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Bobby Eshleman <bobby.eshleman@...edance.com>, kvm@...r.kernel.org, virtualization@...ts.linux-foundation.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, kernel@...rdevices.ru,
oxffffaa@...il.com
Subject: Re: [PATCH net-next v7 4/4] vsock/virtio: MSG_ZEROCOPY flag support
On Sun, Sep 03, 2023 at 11:13:23AM +0300, Arseniy Krasnov wrote:
>
>
>On 01.09.2023 15:30, Stefano Garzarella wrote:
>> On Sun, Aug 27, 2023 at 11:54:36AM +0300, Arseniy Krasnov wrote:
>>> This adds handling of MSG_ZEROCOPY flag on transmission path: if this
>>> flag is set and zerocopy transmission is possible (enabled in socket
>>> options and transport allows zerocopy), then non-linear skb will be
>>> created and filled with the pages of user's buffer. Pages of user's
>>> buffer are locked in memory by 'get_user_pages()'. Second thing that
>>> this patch does is replace type of skb owning: instead of calling
>>> 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
>>> change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
>>> of socket, so to decrease this field correctly proper skb destructor is
>>> needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
>>>
>>> Signed-off-by: Arseniy Krasnov <avkrasnov@...utedevices.com>
[...]
>>>
>>> -/* Returns a new packet on success, otherwise returns NULL.
>>> - *
>>> - * If NULL is returned, errp is set to a negative errno.
>>> - */
>>> -static struct sk_buff *
>>> -virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>>> - size_t len,
>>> - u32 src_cid,
>>> - u32 src_port,
>>> - u32 dst_cid,
>>> - u32 dst_port)
>>> -{
>>> - const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
>>> - struct virtio_vsock_hdr *hdr;
>>> - struct sk_buff *skb;
>>> +static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info *info,
>>> + size_t max_to_send)
>> ^
>> I'd call it `pkt_len`, `max_to_send` is confusing IMHO. I didn't
>> initially if it was the number of buffers or bytes.
>>
>>> +{
>>> + const struct virtio_transport *t_ops;
>>> + struct iov_iter *iov_iter;
>>> +
>>> + if (!info->msg)
>>> + return false;
>>> +
>>> + iov_iter = &info->msg->msg_iter;
>>> +
>>> + if (iov_iter->iov_offset)
>>> + return false;
>>> +
>>> + /* We can't send whole iov. */
>>> + if (iov_iter->count > max_to_send)
>>> + return false;
>>> +
>>> + /* Check that transport can send data in zerocopy mode. */
>>> + t_ops = virtio_transport_get_ops(info->vsk);
>>> +
>>> + if (t_ops->can_msgzerocopy) {
>>
>> So if `can_msgzerocopy` is not implemented, we always return true after
>> this point. Should we mention it in the .can_msgzerocopy documentation?
>
>Ops, this is my mistake, I must return 'false' in this case. Seems I didn't
>catch this problem with my tests, because there was no test case where
>zerocopy will fallback to copy!
>
>I'll fix it and add new test!
yep, I agree!
>
>>
>> Can we also mention in the commit description why this is need only for
>> virtio_tranport and not for vhost and loopback?
>>
>>> + int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
>>> + int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
>>> +
>>> + return t_ops->can_msgzerocopy(pages_to_send);
>>> + }
>>> +
>>> + return true;
>>> +}
>>> +
[...]
>>> @@ -270,6 +395,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>> break;
>>> }
>>>
>>> + /* This is last skb to send this portion of data. */
>>
>> Sorry I didn't get it :-(
>>
>> Can you elaborate this a bit more?
>
>I mean that we iterate over user's buffer here, allocating skb on each
>iteration. And for last skb for this buffer we initialize completion
>for user (we need to allocate one completion for one syscall).
Okay, so maybe we should explain better also in the code comment.
>
>Thanks for review, I'll fix all other comments and resend patchset when
>'net-next' will be opened again.
Cool, thanks!
Stefano
Powered by blists - more mailing lists