lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 02 Dec 2014 09:59:48 +0008
From:	Jason Wang <jasowang@...hat.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, davem@...emloft.net,
	pagupta@...hat.com
Subject: Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts



On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst@...hat.com> 
wrote:
> On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
>>  
>>  
>>  On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang@...hat.com> 
>> wrote:
>>  >
>>  >
>>  >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin 
>> <mst@...hat.com> wrote:
>>  >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
>>  >>> Hello:
>>  >>>  We used to orphan packets before transmission for virtio-net. 
>> This
>>  >>>breaks
>>  >>> socket accounting and can lead serveral functions won't work, 
>> e.g:
>>  >>>  - Byte Queue Limit depends on tx completion nofication to work.
>>  >>> - Packet Generator depends on tx completion nofication for the 
>> last
>>  >>>   transmitted packet to complete.
>>  >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc 
>> to
>>  >>>work.
>>  >>>  This series tries to solve the issue by enabling tx 
>> interrupts. To
>>  >>>minize
>>  >>> the performance impacts of this, several optimizations were 
>> used:
>>  >>>  - In guest side, virtqueue_enable_cb_delayed() was used to 
>> delay the
>>  >>>tx
>>  >>>   interrupt untile 3/4 pending packets were sent.
>>  >>> - In host side, interrupt coalescing were used to reduce tx
>>  >>>interrupts.
>>  >>>  Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
>>  >>>  - For guest receiving. No obvious regression on throughput were
>>  >>>   noticed. More cpu utilization were noticed in few cases.
>>  >>> - For guest transmission. Very huge improvement on througput for
>>  >>>small
>>  >>>   packet transmission were noticed. This is expected since TSQ 
>> and
>>  >>>other
>>  >>>   optimization for small packet transmission work after tx 
>> interrupt.
>>  >>>But
>>  >>>   will use more cpu for large packets.
>>  >>> - For TCP_RR, regression (10% on transaction rate and cpu
>>  >>>utilization) were
>>  >>>   found. Tx interrupt won't help but cause overhead in this 
>> case.
>>  >>>Using
>>  >>>   more aggressive coalescing parameters may help to reduce the
>>  >>>regression.
>>  >>
>>  >>OK, you do have posted coalescing patches - does it help any?
>>  >
>>  >Helps a lot.
>>  >
>>  >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
>>  >For small packet TX, it increases 33% - 245% throughput. (reduce 
>> about 60%
>>  >inters)
>>  >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx 
>> intrs)
>>  >
>>  >>
>>  >>I'm not sure the regression is due to interrupts.
>>  >>It would make sense for CPU but why would it
>>  >>hurt transaction rate?
>>  >
>>  >Anyway guest need to take some cycles to handle tx interrupts.
>>  >And transaction rate does increase if we coalesces more tx 
>> interurpts.
>>  >>
>>  >>
>>  >>It's possible that we are deferring kicks too much due to BQL.
>>  >>
>>  >>As an experiment: do we get any of it back if we do
>>  >>-        if (kick || netif_xmit_stopped(txq))
>>  >>-                virtqueue_kick(sq->vq);
>>  >>+        virtqueue_kick(sq->vq);
>>  >>?
>>  >
>>  >
>>  >I will try, but during TCP_RR, at most 1 packets were pending,
>>  >I suspect if BQL can help in this case.
>>  
>>  Looks like this helps a lot in multiple sessions of TCP_RR.
> 
> so what's faster
> 	BQL + kick each packet
> 	no BQL
> ?

Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not 
show obvious differences.

May need a complete benchmark to see.
> 
> 
>>  How about move the BQL patch out of this series?
>>  
>>  Let's first converge tx interrupt and then introduce it?
>>  (e.g with kicking after queuing X bytes?)
> 
> Sounds good.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ