netdev - Re: [RFC net-next 0/4] gianfar: Use separate NAPI for Tx confirmation processing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 9 Aug 2012 18:07:10 +0300
From:	Claudiu Manoil <claudiu.manoil@...escale.com>
To:	Tomas Hruby <thruby@...il.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Paul Gortmaker <paul.gortmaker@...driver.com>
CC:	<netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>
Subject: Re: [RFC net-next 0/4] gianfar: Use separate NAPI for Tx confirmation
 processing

On 8/9/2012 2:06 AM, Tomas Hruby wrote:
> On Wed, Aug 8, 2012 at 9:44 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> On Wed, 2012-08-08 at 12:24 -0400, Paul Gortmaker wrote:
>>> [[RFC net-next 0/4] gianfar: Use separate NAPI for Tx confirmation processing] On 08/08/2012 (Wed 15:26) Claudiu Manoil wrote:
>>>
>>>> Hi all,
>>>> This set of patches basically splits the existing napi poll routine into
>>>> two separate napi functions, one for Rx processing (triggered by frame
>>>> receive interrupts only) and one for the Tx confirmation path processing
>>>> (triggerred by Tx confirmation interrupts only). The polling algorithm
>>>> behind remains much the same.
>>>>
>>>> Important throughput improvements have been noted on low power boards with
>>>> this set of changes.
>>>> For instance, for the following netperf test:
>>>> netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
>>>> yields a throughput gain from oscilating ~500-~700 Mbps to steady ~940 Mbps,
>>>> (if the Rx/Tx paths are processed on different cores), w/ no increase in CPU%,
>>>> on a p1020rdb - 2 core machine featuring etsec2.0 (Multi-Queue Multi-Group
>>>> driver mode).
>>>
>>> It would be interesting to know more about what was causing that large
>>> an oscillation -- presumably you will have it reappear once one core
>>> becomes 100% utilized.  Also, any thoughts on how the change will change
>>> performance on an older low power single core gianfar system (e.g.  83xx)?
>>
>> I also was wondering if this low performance could be caused by BQL
>>
>> Since TCP stack is driven by incoming ACKS, a NAPI run could have to
>> handle 10 TCP acks in a row, and resulting xmits could hit BQL and
>> transit on qdisc (Because NAPI handler wont handle TX completions in the
>> middle of RX handler)
>
> Does disabling BQL help? Is the BQL limit stable? To what value is it
> set? I would be very much interested in more data if the issue is BQL
> related.
>
> .
>

I agree that more tests should be run to investigate why gianfar under-
performs on the low power p1020rdb platform, and BQL seems to be
a good starting point (thanks for the hint). What I can say now is that
the issue is not apparent on p2020rdb, for instance, which is a more
powerful platform: the CPUs - 1200 MHz instead of 800 MHz; twice the
size of L2 cache (512 KB), greater bus (CCB) frequency ... On this
board (p2020rdb) the netperf test reaches 940Mbps both w/ and w/o these
patches.

For a single core system I'm not expecting any performance degradation,
simply because I don't see why the proposed napi poll implementation
would be slower than the existing one. I'll do some measurements on a
p1010rdb too (single core, CPU:800 MHz) and get back to you with the
results.

Thanks.
Claudiu




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html