netdev - RE: TCP performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 11 Nov 2013 15:05:54 -0000
From:	"David Laight" <David.Laight@...LAB.COM>
To:	"Eric Dumazet" <eric.dumazet@...il.com>,
	"Sujith Manoharan" <sujith@...jith.org>
Cc:	<netdev@...r.kernel.org>, "Dave Taht" <dave.taht@...il.com>
Subject: RE: TCP performance regression

> On Mon, 2013-11-11 at 13:49 +0530, Sujith Manoharan wrote:
> 
> > I am not really clear on how this regression can be fixed in the driver
> > since the majority of the transmission/aggregation logic is present in the
> > TX completion path.
> 
> We have many choices.
> 
> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
>    so that buggy drivers can sustain 'line rate'.
> 
>    Note that with 100 concurrent TCP streams, total amount of bytes
>    queued on the NIC is 12 MB.
>    And pfifo_fast qdisc will drop packets anyway.
> 
>    Thats what we call 'BufferBloat'
> 
> 2) Try lower values like 64K. Still bufferbloat.
> 
> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
> case for example)
> 
> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
> have to artificially force TCP stack to queue too many bytes in
> Qdisc/NIC queues.

Or, maybe:
5) call skb_orphan() (I think that is the correct function) when transmit
   packets are given to the hardware.
   I think that if the mac driver supports BQL this could be done as soon
   as the BQL resource is assigned to the packet.
   I suspect this could be done unconditionally.

Clearly the skb may also need to be freed to allow protocol
retransmissions to complete properly - but that won't be so timing
critical.

I remember (a long time ago) getting a measurable performance increase
by disabling the 'end of transmit' interrupt and only doing tx tidyup
when the driver was active for other reasons.
There were 2 reasons for enabling the interrupt:
1) tx ring full.
2) tx buffer had a user-defined delete function.

	David