netdev - Re: [PATCH net-next v4 0/4] vsock/virtio/vhost: MSG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9fa21e91-f92d-03a2-aac6-cfa378fb84eb@sberdevices.ru>
Date: Sun, 30 Jul 2023 11:57:19 +0300
From: Arseniy Krasnov <avkrasnov@...rdevices.ru>
To: "Michael S. Tsirkin" <mst@...hat.com>
CC: Stefan Hajnoczi <stefanha@...hat.com>, Stefano Garzarella
	<sgarzare@...hat.com>, "David S. Miller" <davem@...emloft.net>, Eric Dumazet
	<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>, Jason Wang <jasowang@...hat.com>, Bobby Eshleman
	<bobby.eshleman@...edance.com>, <kvm@...r.kernel.org>,
	<virtualization@...ts.linux-foundation.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <kernel@...rdevices.ru>, <oxffffaa@...il.com>
Subject: Re: [PATCH net-next v4 0/4] vsock/virtio/vhost: MSG_ZEROCOPY
 preparations



On 28.07.2023 08:45, Michael S. Tsirkin wrote:
> On Fri, Jul 28, 2023 at 01:26:23AM +0300, Arseniy Krasnov wrote:
>> Hello,
>>
>> this patchset is first of three parts of another big patchset for
>> MSG_ZEROCOPY flag support:
>> https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
> 
> overall looks good. Two points I'd like to see addressed:
> - what's the performance with all these changes - still same?

Hello Michael,

here are results on the last version:

There is some difference between these numbers and numbers from link
(it was v3). Looks like new version of zerocopy become slower on big
buffers. But anyway it is faster than copy mode in all cases (except
<<<<<< marked line below, but I had same result for this testcase in v3
before).

I tried to find reason of this difference by switching to v3 version, but
seems it is no easy - I get current results again. I guess reason maybe:
1) My environment change - I perform this test in nested virtualization
   mode, so host OS may also affect performance.
2) My mistake in v3 :(

Anyway:
1) MSG_ZEROCOPY is still faster than copy as expected.

2) I'v added column with benchmark on 'net-next' without MSG_ZEROCOPY
   patchset. Seems it doesn't affect copy performance. Cases where we
   have difference like 26 against 29 is not a big deal - final result
   is unstable with some error, e.g. if you run again same test, you
   can get opposite result like 29 against 26.

2) Numbers below could be considered valid. This is newest measurement.


G2H transmission (values are Gbit/s):

   Core i7 with nested guest.

*-------------------------------*-----------------------*
|          |         |          |                       |
| buf size |   copy  | zerocopy | copy w/o MSG_ZEROCOPY |
|          |         |          |       patchset        |
|          |         |          |                       |
*-------------------------------*-----------------------*
|   4KB    |    3    |    11    |           3           |
*-------------------------------*-----------------------*
|   32KB   |    9    |    70    |          10           |
*-------------------------------*-----------------------*
|   256KB  |   30    |   224    |          29           |
*-------------------------------*-----------------------*
|    1M    |   27    |   285    |          30           |
*-------------------------------*-----------------------*
|    8M    |   26    |   365    |          29           |
*-------------------------------*-----------------------*


H2G:

   Core i7 with nested guest.

*-------------------------------*-----------------------*
|          |         |          |                       |
| buf size |   copy  | zerocopy | copy w/o MSG_ZEROCOPY |
|          |         |          |       patchset        |
|          |         |          |                       |
*-------------------------------*-----------------------*
|   4KB    |   17    |    10    |          17           | <<<<<<
*-------------------------------*-----------------------*
|   32KB   |   30    |    61    |          31           |
*-------------------------------*-----------------------*
|   256KB  |   35    |   214    |          30           |
*-------------------------------*-----------------------*
|    1M    |   29    |   292    |          28           |
*-------------------------------*-----------------------*
|    8M    |   28    |   341    |          28           |
*-------------------------------*-----------------------*

Loopback:

   Core i7 with nested guest.

*-------------------------------*-----------------------*
|          |         |          |                       |
| buf size |   copy  | zerocopy | copy w/o MSG_ZEROCOPY |
|          |         |          |       patchset        |
|          |         |          |                       |
*-------------------------------*-----------------------*
|   4KB    |    8    |     7    |           8           |
*-------------------------------*-----------------------*
|   32KB   |   27    |    43    |          30           |
*-------------------------------*-----------------------*
|   256KB  |   38    |   100    |          39           |
*-------------------------------*-----------------------*
|    1M    |   37    |   141    |          39           |
*-------------------------------*-----------------------*
|    8M    |   40    |   201    |          36           |
*-------------------------------*-----------------------*

Thanks, Arseniy

> - most systems have a copybreak scheme where buffers
>   smaller than a given size are copied directly.
>   This will address regression you see with small buffers -
>   but need to find that value. we know it's between 4k and 32k :)
> 
> 
>> During review of this series, Stefano Garzarella <sgarzare@...hat.com>
>> suggested to split it for three parts to simplify review and merging:
>>
>> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
>> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>>    tx completions) and update for Documentation/.
>> 3) Updates for tests and utils.
>>
>> This series enables handling of fragged skbs in virtio and vhost parts.
>> Newly logic won't be triggered, because SO_ZEROCOPY options is still
>> impossible to enable at this moment (next bunch of patches from big
>> set above will enable it).
>>
>> I've included changelog to some patches anyway, because there were some
>> comments during review of last big patchset from the link above.
>>
>> Head for this patchset is 9d0cd5d25f7d45bce01bbb3193b54ac24b3a60f3
>>
>> Link to v1:
>> https://lore.kernel.org/netdev/20230717210051.856388-1-AVKrasnov@sberdevices.ru/
>> Link to v2:
>> https://lore.kernel.org/netdev/20230718180237.3248179-1-AVKrasnov@sberdevices.ru/
>> Link to v3:
>> https://lore.kernel.org/netdev/20230720214245.457298-1-AVKrasnov@sberdevices.ru/
>>
>> Changelog:
>>  * Patchset rebased and tested on new HEAD of net-next (see hash above).
>>  * See per-patch changelog after ---.
>>
>> Arseniy Krasnov (4):
>>   vsock/virtio/vhost: read data from non-linear skb
>>   vsock/virtio: support to send non-linear skb
>>   vsock/virtio: non-linear skb handling for tap
>>   vsock/virtio: MSG_ZEROCOPY flag support
>>
>>  drivers/vhost/vsock.c                   |  14 +-
>>  include/linux/virtio_vsock.h            |   6 +
>>  net/vmw_vsock/virtio_transport.c        |  79 +++++-
>>  net/vmw_vsock/virtio_transport_common.c | 312 ++++++++++++++++++------
>>  4 files changed, 330 insertions(+), 81 deletions(-)
>>
>> -- 
>> 2.25.1
>