lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 May 2019 17:33:40 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Stefano Garzarella <sgarzare@...hat.com>, netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Stefan Hajnoczi <stefanha@...hat.com>
Subject: Re: [PATCH v2 0/8] vsock/virtio: optimizations to increase the
 throughput


On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> While I was testing this new series (v2) I discovered an huge use of memory
> and a memory leak in the virtio-vsock driver in the guest when I sent
> 1-byte packets to the guest.
>
> These issues are present since the introduction of the virtio-vsock
> driver. I added the patches 1 and 2 to fix them in this series in order
> to better track the performance trends.
>
> v1: https://patchwork.kernel.org/cover/10885431/
>
> v2:
> - Add patch 1 to limit the memory usage
> - Add patch 2 to avoid memory leak during the socket release
> - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> - Patch 5: Avoid integer underflow of iov_len [Stefan]
> - Patch 5: Fix packet capture in order to see the exact packets that are
>             delivered. [Stefan]
> - Add patch 8 to make the RX buffer size tunable [Stefan]
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support.
> As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> added a column with virtio-net+vhost-net performance.
>
> A brief description of patches:
> - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> - Patches 3+4: fix locking and reduce the number of credit update messages sent
>                 to the transmitter
> - Patches 5+6: allow the host to split packets on multiple buffers and use
>                 VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> - Patches 7+8: increase RX buffer size to 64 KiB
>
>                      host -> guest [Gbps]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
> 256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
> 512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
> 1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
> 2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
> 4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
> 8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
> 16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
> 32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
> 64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
> 128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
> 256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
> 512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096
>
>                      guest -> host [Gbps]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
> 256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
> 512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
> 1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
> 2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
> 4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
> 8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
> 16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
> 32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
> 64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
> 128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
> 256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
> 512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401
>
> As Stefan suggested in the v1, this time I measured also the efficiency in this
> way:
>      efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
>          host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
> 256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
> 512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
> 1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
> 2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
> 4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
> 8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
> 16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
> 32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
> 64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
> 128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
> 256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
> 512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43
>
>          guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
>                                                                       TCP_NODELAY
> 64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
> 256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
> 512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
> 1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
> 2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
> 4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
> 8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
> 16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
> 32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
> 64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
> 128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
> 256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
> 512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27
>
> [1] https://github.com/stefano-garzarella/iperf/


Hi:

Do you have any explanation that vsock is better here? Is this because 
of the mergeable buffer? If you, we need test with mrg_rxbuf=off.

Thanks


>
> Stefano Garzarella (8):
>    vsock/virtio: limit the memory used per-socket
>    vsock/virtio: free packets during the socket release
>    vsock/virtio: fix locking for fwd_cnt and buf_alloc
>    vsock/virtio: reduce credit update messages
>    vhost/vsock: split packets to send using multiple buffers
>    vsock/virtio: change the maximum packet size allowed
>    vsock/virtio: increase RX buffer size to 64 KiB
>    vsock/virtio: make the RX buffer size tunable
>
>   drivers/vhost/vsock.c                   |  53 +++++++--
>   include/linux/virtio_vsock.h            |  14 ++-
>   net/vmw_vsock/virtio_transport.c        |  28 ++++-
>   net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
>   4 files changed, 190 insertions(+), 49 deletions(-)
>

Powered by blists - more mailing lists