netdev - Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5C1384E8.7040506@huawei.com>
Date:   Fri, 14 Dec 2018 18:24:40 +0800
From:   jiangyiwen <jiangyiwen@...wei.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
CC:     Stefan Hajnoczi <stefanha@...hat.com>,
        Jason Wang <jasowang@...hat.com>, <netdev@...r.kernel.org>,
        <kvm@...r.kernel.org>, <virtualization@...ts.linux-foundation.org>
Subject: Re: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock

On 2018/12/12 23:09, Michael S. Tsirkin wrote:
> On Wed, Dec 12, 2018 at 05:25:50PM +0800, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>>
>> This series of patches mainly did three things：
>> - mergeable buffer implementation
>> - increase the max send pkt size
>> - add used and signal guest in a batch
>>
>> And I write a tool to test the vhost-vsock performance, mainly send big
>> packet(64K) included guest->Host and Host->Guest. I test performance
>> independently and the result as follows:
>>
>> Before performance:
>>               Single socket            Multiple sockets(Max Bandwidth)
>> Guest->Host   ~400MB/s                 ~480MB/s
>> Host->Guest   ~1450MB/s                ~1600MB/s
>>
>> After performance only use implement mergeable rx buffer:
>>               Single socket            Multiple sockets(Max Bandwidth)
>> Guest->Host   ~400MB/s                 ~480MB/s
>> Host->Guest   ~1280MB/s                ~1350MB/s
>>
>> In this case, max send pkt size is still limited to 4K, so Host->Guest
>> performance will worse than before.
> 
> It's concerning though, what if application sends small packets?
> What is the source of the slowdown? Do you know?
> 

Hi Michael,

To the two cases, I test the results included small and big packets as
follows:

64K packets performance comparison:
                                              Single socket    Multiple sockets
Host->Guest(before)                           1352.60MB/s      1436.33MB/s


Host->Guest(only use mergeable rx buffer)     1290.08MB/s      1212.67MB/s

4K packets performance comparison:
                                              Single socket    Multiple sockets
Host->Guest(before)                           535.47MB/s       688.67MB/s
Host->Guest(only use mergeable rx buffer)     522.33MB/s       599.00MB/s

3K packets performance comparison:
                                              Single socket    Multiple sockets
Host->Guest(before)                           359.74MB/s       442.00MB/s
Host->Guest(only use mergeable rx buffer)     374.47MB/s       452.33MB/s

We can see an interesting thing, for 64K and 4K packets,
using mergeable buffer has a poor performance, for 3K packet,
both have the same performance.

I guess in mergeable mode, when host send a 4k packet to guest, we
should call vhost_get_vq_desc() twice in host(hdr + 4k data),
and in guest we also should call virtqueue_get_buf() twice. So
when packet is smaller than (4k - hdr), it can be packed in a
single page, so the performance is the same as before.

So in the mergeable mode, the performance may be
worse in ((4k - hdr), 4k] than before.

Thanks,
Yiwen.

>> After performance increase the max send pkt size to 64K:
>>               Single socket            Multiple sockets(Max Bandwidth)
>> Guest->Host   ~1700MB/s                ~2900MB/s
>> Host->Guest   ~1500MB/s                ~2440MB/s
>>
>> After performance all patches are used:
>>               Single socket            Multiple sockets(Max Bandwidth)
>> Guest->Host   ~1700MB/s                ~2900MB/s
>> Host->Guest   ~1700MB/s                ~2900MB/s
>>
>> >From the test results, the performance is improved obviously, and guest
>> memory will not be wasted.
>>
>> In addition, in order to support mergeable rx buffer in virtio-vsock,
>> we need to add a qemu patch to support parse feature.
>>
>> ---
>> v1 -> v2:
>>  * Addressed comments from Jason Wang.
>>  * Add performance test result independently.
>>  * Use Skb_page_frag_refill() which can use high order page and reduce
>>    the stress of page allocator.
>>  * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
>>    size can't fill one full packet, we only 128 vq num now.
>>  * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
>>    consistency.
>>  * Add virtio_transport ops to get max pkt len, in order to be compatible
>>    with old version.
>> ---
>>
>> Yiwen Jiang (5):
>>   VSOCK: support fill mergeable rx buffer in guest
>>   VSOCK: support fill data to mergeable rx buffer in host
>>   VSOCK: support receive mergeable rx buffer in guest
>>   VSOCK: increase send pkt len in mergeable mode to improve performance
>>   VSOCK: batch sending rx buffer to increase bandwidth
>>
>>  drivers/vhost/vsock.c                   | 183 ++++++++++++++++++++-----
>>  include/linux/virtio_vsock.h            |  13 +-
>>  include/uapi/linux/virtio_vsock.h       |   5 +
>>  net/vmw_vsock/virtio_transport.c        | 229 +++++++++++++++++++++++++++-----
>>  net/vmw_vsock/virtio_transport_common.c |  66 ++++++---
>>  5 files changed, 411 insertions(+), 85 deletions(-)
>>
>> -- 
>> 1.8.3.1
> 
> .
>