lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 09 Sep 2010 16:00:24 -0700
From:	Sridhar Samudrala <sri@...ibm.com>
To:	Krishna Kumar2 <krkumar2@...ibm.com>
CC:	anthony@...emonkey.ws, davem@...emloft.net, kvm@...r.kernel.org,
	"Michael S. Tsirkin" <mst@...hat.com>, netdev@...r.kernel.org,
	rusty@...tcorp.com.au
Subject: Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

  On 9/9/2010 2:45 AM, Krishna Kumar2 wrote:
>> Krishna Kumar2/India/IBM wrote on 09/08/2010 10:17:49 PM:
> Some more results and likely cause for single netperf
> degradation below.
>
>
>> Guest ->  Host (single netperf):
>> I am getting a drop of almost 20%. I am trying to figure out
>> why.
>>
>> Host ->  guest (single netperf):
>> I am getting an improvement of almost 15%. Again - unexpected.
>>
>> Guest ->  Host TCP_RR: I get an average 7.4% increase in #packets
>> for runs upto 128 sessions. With fewer netperf (under 8), there
>> was a drop of 3-7% in #packets, but beyond that, the #packets
>> improved significantly to give an average improvement of 7.4%.
>>
>> So it seems that fewer sessions is having negative effect for
>> some reason on the tx side. The code path in virtio-net has not
>> changed much, so the drop in some cases is quite unexpected.
> The drop for the single netperf seems to be due to multiple vhost.
> I changed the patch to start *single* vhost:
>
> Guest ->  Host (1 netperf, 64K): BW: 10.79%, SD: -1.45%
> Guest ->  Host (1 netperf)     : Latency: -3%, SD: 3.5%
I remember seeing similar issue when using a separate vhost thread for 
TX and
RX queues.  Basically, we should have the same vhost thread process a 
TCP flow
in both directions. I guess this allows the data and ACKs to be 
processed in sync.


Thanks
Sridhar
> Single vhost performs well but hits the barrier at 16 netperf
> sessions:
>
> SINGLE vhost (Guest ->  Host):
> 	1 netperf:    BW: 10.7%     SD: -1.4%
> 	4 netperfs:   BW: 3%        SD: 1.4%
> 	8 netperfs:   BW: 17.7%     SD: -10%
>        16 netperfs:  BW: 4.7%      SD: -7.0%
>        32 netperfs:  BW: -6.1%     SD: -5.7%
> BW and SD both improves (guest multiple txqs help). For 32
> netperfs, SD improves.
>
> But with multiple vhosts, guest is able to send more packets
> and BW increases much more (SD too increases, but I think
> that is expected). From the earlier results:
>
> N#      BW1     BW2    (%)      SD1     SD2    (%)      RSD1    RSD2    (%)
> _______________________________________________________________________________
> 4       26387   40716 (54.30)   20      28   (40.00)    86      85
> (-1.16)
> 8       24356   41843 (71.79)   88      129  (46.59)    372     362
> (-2.68)
> 16      23587   40546 (71.89)   375     564  (50.40)    1558    1519
> (-2.50)
> 32      22927   39490 (72.24)   1617    2171 (34.26)    6694    5722
> (-14.52)
> 48      23067   39238 (70.10)   3931    5170 (31.51)    15823   13552
> (-14.35)
> 64      22927   38750 (69.01)   7142    9914 (38.81)    28972   26173
> (-9.66)
> 96      22568   38520 (70.68)   16258   27844 (71.26)   65944   73031
> (10.74)
> _______________________________________________________________________________
> (All tests were done without any tuning)
>
>  From my testing:
>
> 1. Single vhost improves mq guest performance upto 16
>     netperfs but degrades after that.
> 2. Multiple vhost degrades single netperf guest
>     performance, but significantly improves performance
>     for any number of netperf sessions.
>
> Likely cause for the 1 stream degradation with multiple
> vhost patch:
>
> 1. Two vhosts run handling the RX and TX respectively.
>     I think the issue is related to cache ping-pong esp
>     since these run on different cpus/sockets.
> 2. I (re-)modified the patch to share RX with TX[0]. The
>     performance drop is the same, but the reason is the
>     guest is not using txq[0] in most cases (dev_pick_tx),
>     so vhost's rx and tx are running on different threads.
>     But whenever the guest uses txq[0], only one vhost
>     runs and the performance is similar to original.
>
> I went back to my *submitted* patch and started a guest
> with numtxq=16 and pinned every vhost to cpus #0&1. Now
> whether guest used txq[0] or txq[n], the performance is
> similar or better (between 10-27% across 10 runs) than
> original code. Also, -6% to -24% improvement in SD.
>
> I will start a full test run of original vs submitted
> code with minimal tuning (Avi also suggested the same),
> and re-send. Please let me know if you need any other
> data.
>
> Thanks,
>
> - KK
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ