lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 Feb 2023 11:26:19 +0100
From:   Stefano Garzarella <sgarzare@...hat.com>
To:     Krasnov Arseniy <AVKrasnov@...rdevices.ru>
Cc:     Stefan Hajnoczi <stefanha@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Krasnov Arseniy <oxffffaa@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "virtualization@...ts.linux-foundation.org" 
        <virtualization@...ts.linux-foundation.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        kernel <kernel@...rdevices.ru>
Subject: Re: [RFC PATCH v1 07/12] vsock/virtio: MGS_ZEROCOPY flag support

On Mon, Feb 20, 2023 at 09:04:04AM +0000, Krasnov Arseniy wrote:
>On 16.02.2023 18:16, Stefano Garzarella wrote:
>> On Mon, Feb 06, 2023 at 07:00:35AM +0000, Arseniy Krasnov wrote:
>>> This adds main logic of MSG_ZEROCOPY flag processing for packet
>>> creation. When this flag is set and user's iov iterator fits for
>>> zerocopy transmission, call 'get_user_pages()' and add returned
>>> pages to the newly created skb.
>>>
>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@...rdevices.ru>
>>> ---
>>> net/vmw_vsock/virtio_transport_common.c | 212 ++++++++++++++++++++++--
>>> 1 file changed, 195 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 05ce97b967ad..69e37f8a68a6 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -37,6 +37,169 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
>>>     return container_of(t, struct virtio_transport, transport);
>>> }
>>>
>>
>> I'd use bool if we don't need to return an error value in the following
>> new functions.
>>
>>> +static int virtio_transport_can_zcopy(struct iov_iter *iov_iter,
>>> +                      size_t free_space)
>>> +{
>>> +    size_t pages;
>>> +    int i;
>>> +
>>> +    if (!iter_is_iovec(iov_iter))
>>> +        return -1;
>>> +
>>> +    if (iov_iter->iov_offset)
>>> +        return -1;
>>> +
>>> +    /* We can't send whole iov. */
>>> +    if (free_space < iov_iter->count)
>>> +        return -1;
>>> +
>>> +    for (pages = 0, i = 0; i < iov_iter->nr_segs; i++) {
>>> +        const struct iovec *iovec;
>>> +        int pages_in_elem;
>>> +
>>> +        iovec = &iov_iter->iov[i];
>>> +
>>> +        /* Base must be page aligned. */
>>> +        if (offset_in_page(iovec->iov_base))
>>> +            return -1;
>>> +
>>> +        /* Only last element could have not page aligned size.  */
>>> +        if (i != (iov_iter->nr_segs - 1)) {
>>> +            if (offset_in_page(iovec->iov_len))
>>> +                return -1;
>>> +
>>> +            pages_in_elem = iovec->iov_len >> PAGE_SHIFT;
>>> +        } else {
>>> +            pages_in_elem = round_up(iovec->iov_len, PAGE_SIZE);
>>> +            pages_in_elem >>= PAGE_SHIFT;
>>> +        }
>>> +
>>> +        /* In case of user's pages - one page is one frag. */
>>> +        if (pages + pages_in_elem > MAX_SKB_FRAGS)
>>> +            return -1;
>>> +
>>> +        pages += pages_in_elem;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>>> +                       struct sk_buff *skb,
>>> +                       struct iov_iter *iter,
>>> +                       bool zerocopy)
>>> +{
>>> +    struct ubuf_info_msgzc *uarg_zc;
>>> +    struct ubuf_info *uarg;
>>> +
>>> +    uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>>> +                    iov_length(iter->iov, iter->nr_segs),
>>> +                    NULL);
>>> +
>>> +    if (!uarg)
>>> +        return -1;
>>> +
>>> +    uarg_zc = uarg_to_msgzc(uarg);
>>> +    uarg_zc->zerocopy = zerocopy ? 1 : 0;
>>> +
>>> +    skb_zcopy_init(skb, uarg);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int virtio_transport_fill_nonlinear_skb(struct sk_buff *skb,
>>> +                           struct vsock_sock *vsk,
>>> +                           struct virtio_vsock_pkt_info *info)
>>> +{
>>> +    struct iov_iter *iter;
>>> +    int frag_idx;
>>> +    int seg_idx;
>>> +
>>> +    iter = &info->msg->msg_iter;
>>> +    frag_idx = 0;
>>> +    VIRTIO_VSOCK_SKB_CB(skb)->curr_frag = 0;
>>> +    VIRTIO_VSOCK_SKB_CB(skb)->frag_off = 0;
>>> +
>>> +    /* At this moment:
>>> +     * 1) 'iov_offset' is zero.
>>> +     * 2) Every 'iov_base' and 'iov_len' are also page aligned
>>> +     *    (except length of the last element).
>>> +     * 3) Number of pages in this iov <= MAX_SKB_FRAGS.
>>> +     * 4) Length of the data fits in current credit space.
>>> +     */
>>> +    for (seg_idx = 0; seg_idx < iter->nr_segs; seg_idx++) {
>>> +        struct page *user_pages[MAX_SKB_FRAGS];
>>> +        const struct iovec *iovec;
>>> +        size_t last_frag_len;
>>> +        size_t pages_in_seg;
>>> +        int page_idx;
>>> +
>>> +        iovec = &iter->iov[seg_idx];
>>> +        pages_in_seg = iovec->iov_len >> PAGE_SHIFT;
>>> +
>>> +        if (iovec->iov_len % PAGE_SIZE) {
>>> +            last_frag_len = iovec->iov_len % PAGE_SIZE;
>>> +            pages_in_seg++;
>>> +        } else {
>>> +            last_frag_len = PAGE_SIZE;
>>> +        }
>>> +
>>> +        if (get_user_pages((unsigned long)iovec->iov_base,
>>> +                   pages_in_seg, FOLL_GET, user_pages,
>>> +                   NULL) != pages_in_seg)
>>> +            return -1;
>>
>> Reading the get_user_pages() documentation, this should pin the user
>> pages, so we should be fine if we then expose them in the virtqueue.
>>
>> But reading Documentation/core-api/pin_user_pages.rst it seems that
>> drivers should use "pin_user_pages*() for DMA-pinned pages", so I'm not
>> sure what we should do.
>>
>That is really interesting question for me too. IIUC 'pin_user_pages()'
>sets special value to ref counter of page, so we can distinguish such
>pages from the others. I've grepped for pinned pages check and found,
>the it is used in mm/vmscan.c by calling 'folio_maybe_dma_pinned()' during
>page lists processing. Seems 'pin_user_pages()' is more strict version of
>'get_user_pages()' and it is recommended to use 'pin_' when data on these
>pages will be accessed.
>I think, i'll check which API is used in the TCP implementation for zerocopy
>transmission.
>
>> Additional advice would be great!
>>
>> Anyway, when we are done using the pages, we should call put_page() or
>> unpin_user_page() depending on how we pin them.
>>
>In case of 'get_user_pages()' everything is ok here: when such skb
>will be released, 'put_page()' will be called for every frag page
>of it, so there is no page leak.

Got it!

>But in case of 'pin_user_pages()',
>i will need to unpin in manually before calling 'consume_skb()'
>after it is processed by virtio device. But anyway - it is not a
>problem.

Yep.

Thanks,
Stefano

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ