[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <376f8939-1990-abf6-1f5f-57b3822f94fe@redhat.com>
Date: Mon, 23 Oct 2017 10:06:36 +0800
From: Jason Wang <jasowang@...hat.com>
To: Matthew Rosato <mjrosato@...ux.vnet.ibm.com>,
Wei Xu <wexu@...hat.com>, mst@...hat.com
Cc: netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: Regression in throughput between kvm guests over virtual bridge
On 2017年10月19日 04:17, Matthew Rosato wrote:
>> 2. It might be useful to short the traffic path as a reference, What I am running
>> is briefly like:
>> pktgen(host kernel) -> tap(x) -> guest(DPDK testpmd)
>>
>> The bridge driver(br_forward(), etc) might impact performance due to my personal
>> experience, so eventually I settled down with this simplified testbed which fully
>> isolates the traffic from both userspace and host kernel stack(1 and 50 instances,
>> bridge driver, etc), therefore reduces potential interferences.
>>
>> The down side of this is that it needs DPDK support in guest, has this ever be
>> run on s390x guest? An alternative approach is to directly run XDP drop on
>> virtio-net nic in guest, while this requires compiling XDP inside guest which needs
>> a newer distro(Fedora 25+ in my case or Ubuntu 16.10, not sure).
>>
> I made an attempt at DPDK, but it has not been run on s390x as far as
> I'm aware and didn't seem trivial to get working.
>
> So instead I took your alternate suggestion & did:
> pktgen(host) -> tap(x) -> guest(xdp_drop)
>
> When running this setup, I am not able to reproduce the regression. As
> mentioned previously, I am also unable to reproduce when running one end
> of the uperf connection from the host - I have only ever been able to
> reproduce when both ends of the uperf connection are running within a guest.
>
Thanks for the test. Looking at the code, the only obvious difference
when BATCH is 1 is that one spinlock which was previously called by
tun_peek_len() was avoided since we can do it locally. I wonder whether
or not this speeds up handle_rx() a little more then leads more wakeups
during some rates/sizes of TCP stream. To prove this, maybe you can try:
- enable busy polling, using poll-us=1000, and to see if we can still
get the regression
- measure the pps pktgen(vm1) -> tap1 -> bridge -> tap2 -> vm2
Michael, any another possibility in your mind?
Thanks
Powered by blists - more mailing lists