[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <9996b0f1-ffa6-ff95-2e9c-0deccf4623ae@linux.vnet.ibm.com>
Date: Tue, 14 Nov 2017 15:11:44 -0500
From: Matthew Rosato <mjrosato@...ux.vnet.ibm.com>
To: Wei Xu <wexu@...hat.com>
Cc: Jason Wang <jasowang@...hat.com>, mst@...hat.com,
netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: Regression in throughput between kvm guests over virtual bridge
On 11/12/2017 01:34 PM, Wei Xu wrote:
> On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote:
>>>> This case should be quite similar with pkgten, if you got improvement with
>>>> pktgen, usually it was also the same for UDP, could you please try to disable
>>>> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? Currently
>>>> the most significant tests would be like this AFAICT:
>>>>
>>>> Host->VM 4.12 4.13
>>>> TCP:
>>>> UDP:
>>>> pktgen:
>>>>
>>>> Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's patch should
>>>> work since we have seen positive number for that, you can also temporarily skip
>>>> net-next as well.
>>>
>>> Here are the requested numbers, averaged over numerous runs -- guest is
>>> 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread
>>> pinned to other unique host CPUs. tso, gso, gro, ufo disabled on host
>>> taps / guest virtio-net devs as requested:
>>>
>>> Host->VM 4.12 4.13
>>> TCP: 9.92Gb/s 6.44Gb/s
>>> UDP: 5.77Gb/s 6.63Gb/s
>>> pktgen: 1572403pps 1904265pps
>>>
>>> UDP/pktgen both show improvement from 4.12->4.13. More interesting,
>>> however, is that I am seeing the TCP regression for the first time from
>>> host->VM. I wonder if the combination of CPU binding + disabling of one
>>> or more of tso/gso/gro/ufo is related.
>>>
>>>>
>>>> If you see UDP and pktgen are aligned, then it might be helpful to continue
>>>> the other two cases, otherwise we fail in the first place.
>>>
>>
>> I continued running many iterations of these tests between 4.12 and
>> 4.13.. My throughput findings can be summarized as:
>
> Really nice to have these numbers.
>
Wasn't sure if you were asking for the individual #s -- Just in case,
here are the other averages I used to draw my conclusions:
VM->VM 4.12 4.13
UDP 9.06Gb/s 8.99Gb/s
TCP 9.16Gb/s 8.67Gb/s
VM->Host 4.12 4.13
UDP 9.70Gb/s 9.53Gb/s
TCP 6.12Gb/s 6.00Gb/s
>>
>> VM->VM case:
>> UDP: roughly equivalent
>> TCP: Consistent regression (5-10%)
>>
>> VM->Host
>> Both UDP and TCP traffic are roughly equivalent.
>
> The patch improves performance for Rx from guest point of view, so the Tx
> would be no big difference since the Rx packets are far less than Tx in
> this case.
>
>>
>> Host->VM
>> UDP+pktgen: improvement (5-10%), but inconsistent
>> TCP: Consistent regression (25-30%)
>
> Maybe we can try to figure out this case first since it is the shortest path,
> can you have a look at TCP statistics and paste a few outputs between tests?
> I am suspecting there are some retransmitting, zero window probing, etc.
>
Grabbed some netperf -s results after a few minutes of running (snipped
uninteresting icmp and udp sections). The test was TCP Host->VM
scenario, binding and tso/gso/gro/ufo disabled as before:
Host 4.12
Ip:
Forwarding: 1
3724964 total packets received
0 forwarded
0 incoming packets discarded
3724964 incoming packets delivered
5000026 requests sent out
Tcp:
4 active connection openings
1 passive connection openings
0 failed connection attempts
0 connection resets received
1 connections established
3724954 segments received
133112205 segments sent out
93106 segments retransmitted
0 bad segments received
2 resets sent
TcpExt:
5 delayed acks sent
8 packets directly queued to recvmsg prequeue
TCPDirectCopyFromPrequeue: 1736
146 packet headers predicted
4 packet headers predicted and directly queued to user
3218205 acknowledgments not containing data payload received
506561 predicted acknowledgments
TCPSackRecovery: 2096
TCPLostRetransmit: 860
93106 fast retransmits
TCPLossProbes: 5
TCPSackShifted: 1959097
TCPSackMerged: 458343
TCPSackShiftFallback: 7969
TCPRcvCoalesce: 2
TCPOrigDataSent: 133112178
TCPHystartTrainDetect: 2
TCPHystartTrainCwnd: 96
TCPWinProbe: 2
IpExt:
InBcastPkts: 4
InOctets: 226014831
OutOctets: 193103919403
InBcastOctets: 1312
InNoECTPkts: 3724964
Host 4.13
Ip:
Forwarding: 1
5930785 total packets received
0 forwarded
0 incoming packets discarded
5930785 incoming packets delivered
4495113 requests sent out
Tcp:
4 active connection openings
1 passive connection openings
0 failed connection attempts
0 connection resets received
1 connections established
5930775 segments received
73226521 segments sent out
13975 segments retransmitted
0 bad segments received
4 resets sent
TcpExt:
5 delayed acks sent
8 packets directly queued to recvmsg prequeue
TCPDirectCopyFromPrequeue: 1736
18 packet headers predicted
4 packet headers predicted and directly queued to user
4091720 acknowledgments not containing data payload received
1838984 predicted acknowledgments
TCPSackRecovery: 9920
TCPLostRetransmit: 31
13975 fast retransmits
TCPLossProbes: 6
TCPSackShifted: 1700187
TCPSackMerged: 1143698
TCPSackShiftFallback: 23839
TCPRcvCoalesce: 2
TCPOrigDataSent: 73226494
TCPHystartTrainDetect: 2
TCPHystartTrainCwnd: 530
IpExt:
InBcastPkts: 4
InOctets: 344809215
OutOctets: 106285682663
InBcastOctets: 1312
InNoECTPkts: 5930785
Guest 4.12
Ip:
133112471 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
133112470 incoming packets delivered
3724897 requests sent out
40 outgoing packets dropped
Tcp:
0 active connections openings
6 passive connection openings
0 failed connection attempts
2 connection resets received
2 connections established
133112301 segments received
3724731 segments send out
0 segments retransmited
0 bad segments received.
5 resets sent
TcpExt:
1 TCP sockets finished time wait in fast timer
13 delayed acks sent
138408 packets directly queued to recvmsg prequeue.
33119208 bytes directly in process context from backlog
1907783720 bytes directly received in process context from prequeue
127259218 packet headers predicted
1313774 packets header predicted and directly queued to user
24 acknowledgments not containing data payload received
196 predicted acknowledgments
2 connections reset due to early user close
TCPRcvCoalesce: 117069950
TCPOFOQueue: 2425393
TCPFromZeroWindowAdv: 109
TCPToZeroWindowAdv: 109
TCPWantZeroWindowAdv: 4487
TCPOrigDataSent: 223
TCPACKSkippedSeq: 1
IpExt:
InBcastPkts: 2
InOctets: 199630961414
OutOctets: 226019278
InBcastOctets: 656
InNoECTPkts: 133112471
Guest 4.13
Ip:
73226690 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
73226689 incoming packets delivered
5930853 requests sent out
40 outgoing packets dropped
Tcp:
0 active connections openings
6 passive connection openings
0 failed connection attempts
2 connection resets received
2 connections established
73226522 segments received
5930688 segments send out
0 segments retransmited
0 bad segments received.
2 resets sent
TcpExt:
1 TCP sockets finished time wait in fast timer
13 delayed acks sent
490503 packets directly queued to recvmsg prequeue.
306976 bytes directly in process context from backlog
6875924176 bytes directly received in process context from prequeue
65617512 packet headers predicted
4735750 packets header predicted and directly queued to user
20 acknowledgments not containing data payload received
61 predicted acknowledgments
2 connections reset due to early user close
TCPRcvCoalesce: 60654609
TCPOFOQueue: 2857814
TCPOrigDataSent: 85
IpExt:
InBcastPkts: 1
InOctets: 109839485374
OutOctets: 344816614
InBcastOctets: 328
InNoECTPkts: 73226690
>>
>> Host->VM UDP and pktgen seemed to show improvement in some runs, and in
>> others seemed to mirror 4.12-level performance.
>>
>> The TCP regression for VM->VM is no surprise, we started with that.
>> It's still consistent, but smaller in this specific environment.
>
> Right, there are too many facts might influent the performance.
>
>>
>> The TCP regression in Host->VM is interesting because I wasn't seeing it
>> consistently before binding CPUs + disabling tso/gso/gro/ufo. Also
>> interesting because of how large it is -- By any chance can you see this
>> regression on x86 with the same configuration?
>
> Had a quick test and it seems I also got drop on x86 without tso,gro,..., data
> with/without tso,gso,..., will check out tcp statistics and let you know soon.
>
> 4.12
> --------------------------------------------------------------------------
> master 32.34s 112.63GB 29.91Gb/s 4031090 0.00
> master 32.33s 32.58GB 8.66Gb/s 1166014 0.00
> -------------------------------------------------------------------------
>
> 4.13
> -------------------------------------------------------------------------
> master 32.35s 119.17GB 31.64Gb/s 4265190 0.00
> master 32.33s 27.02GB 7.18Gb/s 967007 0.00
> -------------------------------------------------------------------------
>
> Wei
>
Powered by blists - more mailing lists