netdev - Re: [RFC net-next 0/4] gianfar: Use separate NAPI for Tx confirmation processing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 08 Aug 2012 18:44:27 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Paul Gortmaker <paul.gortmaker@...driver.com>
Cc:	Claudiu Manoil <claudiu.manoil@...escale.com>,
	netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>
Subject: Re: [RFC net-next 0/4] gianfar: Use separate NAPI for Tx
 confirmation processing

On Wed, 2012-08-08 at 12:24 -0400, Paul Gortmaker wrote:
> [[RFC net-next 0/4] gianfar: Use separate NAPI for Tx confirmation processing] On 08/08/2012 (Wed 15:26) Claudiu Manoil wrote:
> 
> > Hi all,
> > This set of patches basically splits the existing napi poll routine into
> > two separate napi functions, one for Rx processing (triggered by frame
> > receive interrupts only) and one for the Tx confirmation path processing
> > (triggerred by Tx confirmation interrupts only). The polling algorithm
> > behind remains much the same.
> > 
> > Important throughput improvements have been noted on low power boards with
> > this set of changes.
> > For instance, for the following netperf test:
> > netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> > yields a throughput gain from oscilating ~500-~700 Mbps to steady ~940 Mbps,
> > (if the Rx/Tx paths are processed on different cores), w/ no increase in CPU%,
> > on a p1020rdb - 2 core machine featuring etsec2.0 (Multi-Queue Multi-Group
> > driver mode).
> 
> It would be interesting to know more about what was causing that large
> an oscillation -- presumably you will have it reappear once one core
> becomes 100% utilized.  Also, any thoughts on how the change will change
> performance on an older low power single core gianfar system (e.g.  83xx)?

I also was wondering if this low performance could be caused by BQL

Since TCP stack is driven by incoming ACKS, a NAPI run could have to
handle 10 TCP acks in a row, and resulting xmits could hit BQL and
transit on qdisc (Because NAPI handler wont handle TX completions in the
middle of RX handler)

So experiments would be nice, maybe by reducing a
bit /proc/sys/net/ipv4/tcp_limit_output_bytes 
(from 131072 to 65536 or 32768)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html