lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 May 2019 11:40:11 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Stefano Garzarella <sgarzare@...hat.com>
Cc:     netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Stefan Hajnoczi <stefanha@...hat.com>
Subject: Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket


On 2019/5/14 上午11:25, Jason Wang wrote:
>
> On 2019/5/14 上午1:23, Stefano Garzarella wrote:
>> On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
>>> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>>>> Since virtio-vsock was introduced, the buffers filled by the host
>>>> and pushed to the guest using the vring, are directly queued in
>>>> a per-socket list avoiding to copy it.
>>>> These buffers are preallocated by the guest with a fixed
>>>> size (4 KB).
>>>>
>>>> The maximum amount of memory used by each socket should be
>>>> controlled by the credit mechanism.
>>>> The default credit available per-socket is 256 KB, but if we use
>>>> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
>>>> buffers, using up to 1 GB of memory per-socket. In addition, the
>>>> guest will continue to fill the vring with new 4 KB free buffers
>>>> to avoid starvation of her sockets.
>>>>
>>>> This patch solves this issue copying the payload in a new buffer.
>>>> Then it is queued in the per-socket list, and the 4KB buffer used
>>>> by the host is freed.
>>>>
>>>> In this way, the memory used by each socket respects the credit
>>>> available, and we still avoid starvation, paying the cost of an
>>>> extra memory copy. When the buffer is completely full we do a
>>>> "zero-copy", moving the buffer directly in the per-socket list.
>>>
>>> I wonder in the long run we should use generic socket accouting 
>>> mechanism
>>> provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) 
>>> instead of
>>> vsock specific thing to avoid duplicating efforts.
>> I agree, the idea is to switch to sk_buff but this should require an 
>> huge
>> change. If we will use the virtio-net datapath, it will become simpler.
>
>
> Yes, unix domain socket is one example that uses general skb and 
> socket structure. And we probably need some kind of socket pair on 
> host. Using socket can also simplify the unification with vhost-net 
> which depends on the socket proto_ops to work. I admit it's a huge 
> change probably, we can do it gradually.
>
>
>>>
>>>> Signed-off-by: Stefano Garzarella <sgarzare@...hat.com>
>>>> ---
>>>>    drivers/vhost/vsock.c                   |  2 +
>>>>    include/linux/virtio_vsock.h            |  8 +++
>>>>    net/vmw_vsock/virtio_transport.c        |  1 +
>>>>    net/vmw_vsock/virtio_transport_common.c | 95 
>>>> ++++++++++++++++++-------
>>>>    4 files changed, 81 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>> index bb5fc0e9fbc2..7964e2daee09 100644
>>>> --- a/drivers/vhost/vsock.c
>>>> +++ b/drivers/vhost/vsock.c
>>>> @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>>>>            return NULL;
>>>>        }
>>>> +    pkt->buf_len = pkt->len;
>>>> +
>>>>        nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
>>>>        if (nbytes != pkt->len) {
>>>>            vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
>>>> diff --git a/include/linux/virtio_vsock.h 
>>>> b/include/linux/virtio_vsock.h
>>>> index e223e2632edd..345f04ee9193 100644
>>>> --- a/include/linux/virtio_vsock.h
>>>> +++ b/include/linux/virtio_vsock.h
>>>> @@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
>>>>        void *buf;
>>>>        u32 len;
>>>>        u32 off;
>>>> +    u32 buf_len;
>>>>        bool reply;
>>>>    };
>>>> +struct virtio_vsock_buf {
>>>> +    struct list_head list;
>>>> +    void *addr;
>>>> +    u32 len;
>>>> +    u32 off;
>>>> +};
>>>> +
>>>>    struct virtio_vsock_pkt_info {
>>>>        u32 remote_cid, remote_port;
>>>>        struct vsock_sock *vsk;
>>>> diff --git a/net/vmw_vsock/virtio_transport.c 
>>>> b/net/vmw_vsock/virtio_transport.c
>>>> index 15eb5d3d4750..af1d2ce12f54 100644
>>>> --- a/net/vmw_vsock/virtio_transport.c
>>>> +++ b/net/vmw_vsock/virtio_transport.c
>>>> @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct 
>>>> virtio_vsock *vsock)
>>>>                break;
>>>>            }
>>>> +        pkt->buf_len = buf_len;
>>>>            pkt->len = buf_len;
>>>>            sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c 
>>>> b/net/vmw_vsock/virtio_transport_common.c
>>>> index 602715fc9a75..0248d6808755 100644
>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>> @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct 
>>>> virtio_vsock_pkt_info *info,
>>>>            pkt->buf = kmalloc(len, GFP_KERNEL);
>>>>            if (!pkt->buf)
>>>>                goto out_pkt;
>>>> +
>>>> +        pkt->buf_len = len;
>>>> +
>>>>            err = memcpy_from_msg(pkt->buf, info->msg, len);
>>>>            if (err)
>>>>                goto out;
>>>> @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct 
>>>> virtio_vsock_pkt_info *info,
>>>>        return NULL;
>>>>    }
>>>> +static struct virtio_vsock_buf *
>>>> +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool 
>>>> zero_copy)
>>>> +{
>>>> +    struct virtio_vsock_buf *buf;
>>>> +
>>>> +    if (pkt->len == 0)
>>>> +        return NULL;
>>>> +
>>>> +    buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>>> +    if (!buf)
>>>> +        return NULL;
>>>> +
>>>> +    /* If the buffer in the virtio_vsock_pkt is full, we can move 
>>>> it to
>>>> +     * the new virtio_vsock_buf avoiding the copy, because we are 
>>>> sure that
>>>> +     * we are not use more memory than that counted by the credit 
>>>> mechanism.
>>>> +     */
>>>> +    if (zero_copy && pkt->len == pkt->buf_len) {
>>>> +        buf->addr = pkt->buf;
>>>> +        pkt->buf = NULL;
>>>> +    } else {
>>>
>>> Is the copy still needed if we're just few bytes less? We meet 
>>> similar issue
>>> for virito-net, and virtio-net solve this by always copy first 
>>> 128bytes for
>>> big packets.
>>>
>>> See receive_big()
>> I'm seeing, It is more sophisticated.
>> IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then 
>> copies the
>> first 128 bytes, then adds the buffer used to receive the packet as a 
>> frag to
>> the skb.
>
>
> Yes and the point is if the packet is smaller than 128 bytes the pages 
> will be recycled. 


To be clear, this only work if you use order 0 page instead of a large 
buffer that is allocated through kmalloc(). Another requirement for 
order 0 page.

Thanks


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ