netdev - Re: Regression in throughput between kvm guests over virtual bridge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <c0b42b27-56f6-c2f9-9476-28d25678808a@linux.vnet.ibm.com>
Date:   Wed, 18 Oct 2017 16:17:51 -0400
From:   Matthew Rosato <mjrosato@...ux.vnet.ibm.com>
To:     Wei Xu <wexu@...hat.com>
Cc:     Jason Wang <jasowang@...hat.com>, netdev@...r.kernel.org,
        davem@...emloft.net, mst@...hat.com
Subject: Re: Regression in throughput between kvm guests over virtual bridge

On 10/12/2017 02:31 PM, Wei Xu wrote:
> On Thu, Oct 05, 2017 at 04:07:45PM -0400, Matthew Rosato wrote:
>>
>> Ping...  Jason, any other ideas or suggestions?
> 
> Hi Matthew,
> Recently I am doing similar test on x86 for this patch, here are some,
> differences between our testbeds.
> 
> 1. It is nice you have got improvement with 50+ instances(or connections here?)
> which would be quite helpful to address the issue, also you've figured out the
> cost(wait/wakeup), kindly reminder did you pin uperf client/server along the whole
> path besides vhost and vcpu threads? 

Was not previously doing any pinning whatsoever, just reproducing an
environment that one of our testers here was running.  Reducing guest
vcpu count from 4->1, still see the regression.  Then, pinned each vcpu
thread and vhost thread to a separate host CPU -- still made no
difference (regression still present).

> 
> 2. It might be useful to short the traffic path as a reference, What I am running
> is briefly like:
>     pktgen(host kernel) -> tap(x) -> guest(DPDK testpmd)
> 
> The bridge driver(br_forward(), etc) might impact performance due to my personal
> experience, so eventually I settled down with this simplified testbed which fully
> isolates the traffic from both userspace and host kernel stack(1 and 50 instances,
> bridge driver, etc), therefore reduces potential interferences.
> 
> The down side of this is that it needs DPDK support in guest, has this ever be
> run on s390x guest? An alternative approach is to directly run XDP drop on
> virtio-net nic in guest, while this requires compiling XDP inside guest which needs
> a newer distro(Fedora 25+ in my case or Ubuntu 16.10, not sure).
> 

I made an attempt at DPDK, but it has not been run on s390x as far as
I'm aware and didn't seem trivial to get working.

So instead I took your alternate suggestion & did:
pktgen(host) -> tap(x) -> guest(xdp_drop)

When running this setup, I am not able to reproduce the regression.  As
mentioned previously, I am also unable to reproduce when running one end
of the uperf connection from the host - I have only ever been able to
reproduce when both ends of the uperf connection are running within a guest.

> 3. BTW, did you enable hugepage for your guest? It would  performance more
> or less depends on the memory demand when generating traffic, I didn't see
> similar command lines in yours.
> 

s390x does not currently support passing through hugetlb backing via
QEMU mem-path.