lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190510125843.95587-1-sgarzare@redhat.com>
Date:   Fri, 10 May 2019 14:58:35 +0200
From:   Stefano Garzarella <sgarzare@...hat.com>
To:     netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Jason Wang <jasowang@...hat.com>
Subject: [PATCH v2 0/8] vsock/virtio: optimizations to increase the throughput

While I was testing this new series (v2) I discovered an huge use of memory
and a memory leak in the virtio-vsock driver in the guest when I sent
1-byte packets to the guest.

These issues are present since the introduction of the virtio-vsock
driver. I added the patches 1 and 2 to fix them in this series in order
to better track the performance trends.

v1: https://patchwork.kernel.org/cover/10885431/

v2:
- Add patch 1 to limit the memory usage
- Add patch 2 to avoid memory leak during the socket release
- Add patch 3 to fix locking of fwd_cnt and buf_alloc
- Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
- Patch 5: Avoid integer underflow of iov_len [Stefan]
- Patch 5: Fix packet capture in order to see the exact packets that are
           delivered. [Stefan]
- Add patch 8 to make the RX buffer size tunable [Stefan]

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support.
As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
added a column with virtio-net+vhost-net performance.

A brief description of patches:
- Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
- Patches 3+4: fix locking and reduce the number of credit update messages sent
               to the transmitter
- Patches 5+6: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
- Patches 7+8: increase RX buffer size to 64 KiB

                    host -> guest [Gbps]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64         0.068     0.063    0.130    0.131    0.128         0.188     0.187
256        0.274     0.236    0.392    0.338    0.282         0.749     0.654
512        0.531     0.457    0.862    0.725    0.602         1.419     1.414
1K         0.954     0.827    1.591    1.598    1.548         2.599     2.640
2K         1.783     1.543    3.731    3.637    3.469         4.530     4.754
4K         3.332     3.436    7.164    7.124    6.494         7.738     7.696
8K         5.792     5.530   11.653   11.787   11.444        12.307    11.850
16K        8.405     8.462   16.372   16.855   17.562        16.936    16.954
32K       14.208    13.669   18.945   20.009   23.128        21.980    23.015
64K       21.082    18.893   20.266   20.903   30.622        27.290    27.383
128K      20.696    20.148   20.112   21.746   32.152        30.446    30.990
256K      20.801    20.589   20.725   22.685   34.721        33.151    32.745
512K      21.220    20.465   20.432   22.106   34.496        36.847    31.096

                    guest -> host [Gbps]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64         0.089     0.091    0.120    0.115    0.117         0.274     0.272
256        0.352     0.354    0.452    0.445    0.451         1.085     1.136
512        0.705     0.704    0.893    0.858    0.898         2.131     1.882
1K         1.394     1.433    1.721    1.669    1.691         3.984     3.576
2K         2.818     2.874    3.316    3.249    3.303         6.719     6.359
4K         5.293     5.397    6.129    5.933    6.082        10.105     9.860
8K         8.890     9.151   10.990   10.545   10.519        15.239    14.868
16K       11.444    11.018   12.074   15.255   15.577        20.551    20.848
32K       11.229    10.875   10.857   24.401   25.227        26.294    26.380
64K       10.832    10.545   10.816   39.487   39.616        34.996    32.041
128K      10.435    10.241   10.500   39.813   40.012        38.379    35.055
256K      10.263     9.866    9.845   34.971   35.143        36.559    37.232
512K      10.224    10.060   10.092   35.469   34.627        34.963    33.401

As Stefan suggested in the v1, this time I measured also the efficiency in this
way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64          0.94      0.59     3.96     4.06     4.09          2.82      2.11
256         2.62      2.50     6.45     6.09     5.81          9.64      8.73
512         5.16      4.87    13.16    12.39    11.67         17.83     17.76
1K          9.16      8.85    24.98    24.97    25.01         32.57     32.04
2K         17.41     17.03    49.09    48.59    49.22         55.31     57.14
4K         32.99     33.62    90.80    90.98    91.72         91.79     91.40
8K         58.51     59.98   153.53   170.83   167.31        137.51    132.85
16K        89.32     95.29   216.98   264.18   260.95        176.05    176.05
32K       152.94    167.10   285.75   387.02   360.81        215.49    226.30
64K       250.38    307.20   317.65   489.53   472.70        238.97    244.27
128K      327.99    335.24   335.76   523.71   486.41        253.29    260.86
256K      327.06    334.24   338.64   533.76   509.85        267.78    266.22
512K      337.36    330.61   334.95   512.90   496.35        280.42    241.43

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt  p 1+2    p 3+4    p 5+6    p 7+8       virtio-net + vhost
                                                                     TCP_NODELAY
64          0.90      0.91     1.37     1.32     1.35          2.15      2.13
256         3.59      3.55     5.23     5.19     5.29          8.50      8.89
512         7.19      7.08    10.21     9.95    10.38         16.74     14.71
1K         14.15     14.34    19.85    19.06    19.33         31.44     28.11
2K         28.44     29.09    37.78    37.18    37.49         53.07     50.63
4K         55.37     57.60    71.02    69.27    70.97         81.56     79.32
8K        105.58    100.45   111.95   124.68   123.61        120.85    118.66
16K       141.63    138.24   137.67   187.41   190.20        160.43    163.00
32K       147.56    143.09   138.48   296.41   301.04        214.64    223.94
64K       144.81    143.27   138.49   433.98   462.26        298.86    269.71
128K      150.14    147.99   146.85   511.36   514.29        350.17    298.09
256K      156.69    152.25   148.69   542.19   549.97        326.42    333.32
512K      157.29    153.35   152.22   546.52   533.24        315.55    302.27

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (8):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: free packets during the socket release
  vsock/virtio: fix locking for fwd_cnt and buf_alloc
  vsock/virtio: reduce credit update messages
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed
  vsock/virtio: increase RX buffer size to 64 KiB
  vsock/virtio: make the RX buffer size tunable

 drivers/vhost/vsock.c                   |  53 +++++++--
 include/linux/virtio_vsock.h            |  14 ++-
 net/vmw_vsock/virtio_transport.c        |  28 ++++-
 net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
 4 files changed, 190 insertions(+), 49 deletions(-)

-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ