lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Sep 2017 10:04:18 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched
 processing



On 2017年09月27日 03:25, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:
>> This patch implements basic batched processing of tx virtqueue by
>> prefetching desc indices and updating used ring in a batch. For
>> non-zerocopy case, vq->heads were used for storing the prefetched
>> indices and updating used ring. It is also a requirement for doing
>> more batching on top. For zerocopy case and for simplicity, batched
>> processing were simply disabled by only fetching and processing one
>> descriptor at a time, this could be optimized in the future.
>>
>> XDP_DROP (without touching skb) on tun (with Moongen in guest) with
>> zercopy disabled:
>>
>> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:
>> Before: 3.20Mpps
>> After:  3.90Mpps (+22%)
>>
>> No differences were seen with zerocopy enabled.
>>
>> Signed-off-by: Jason Wang <jasowang@...hat.com>
> So where is the speedup coming from? I'd guess the ring is
> hot in cache, it's faster to access it in one go, then
> pass many packets to net stack. Is that right?
>
> Another possibility is better code cache locality.

Yes, I think the speed up comes from:

- less cache misses
- less cache line bounce when virtqueue is about to be full (guest is 
faster than host which is the case of MoonGen)
- less memory barriers
- possible faster copy speed by using copy_to_user() on modern CPUs

>
> So how about this patchset is refactored:
>
> 1. use existing APIs just first get packets then
>     transmit them all then use them all

Looks like current API can not get packets first, it only support get 
packet one by one (if you mean vhost_get_vq_desc()). And used ring 
updating may get more misses in this case.

> 2. add new APIs and move the loop into vhost core
>     for more speedups

I don't see any advantages, looks like just need some e.g callbacks in 
this case.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ