[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1191331238.4353.59.camel@localhost>
Date: Tue, 02 Oct 2007 09:20:38 -0400
From: jamal <hadi@...erus.ca>
To: Bill Fink <billfink@...dspring.com>
Cc: David Miller <davem@...emloft.net>, krkumar2@...ibm.com,
johnpol@....mipt.ru, herbert@...dor.apana.org.au, kaber@...sh.net,
shemminger@...ux-foundation.org, jagana@...ibm.com,
Robert.Olsson@...a.slu.se, rick.jones2@...com, xma@...ibm.com,
gaagaan@...il.com, netdev@...r.kernel.org, rdreier@...co.com,
peter.p.waskiewicz.jr@...el.com, mcarlson@...adcom.com,
jeff@...zik.org, mchan@...adcom.com, general@...ts.openfabrics.org,
kumarkr@...ux.ibm.com, tgraf@...g.ch, randy.dunlap@...cle.com,
sri@...ibm.com
Subject: Re: [PATCH 2/3][NET_BATCH] net core use batching
On Tue, 2007-02-10 at 00:25 -0400, Bill Fink wrote:
> One reason I ask, is that on an earlier set of alternative batching
> xmit patches by Krishna Kumar, his performance testing showed a 30 %
> performance hit for TCP for a single process and a size of 4 KB, and
> a performance hit of 5 % for a single process and a size of 16 KB
> (a size of 8 KB wasn't tested). Unfortunately I was too busy at the
> time to inquire further about it, but it would be a major potential
> concern for me in my 10-GigE network testing with 9000-byte jumbo
> frames. Of course the single process and 4 KB or larger size was
> the only case that showed a significant performance hit in Krishna
> Kumar's latest reported test results, so it might be acceptable to
> just have a switch to disable the batching feature for that specific
> usage scenario. So it would be useful to know if your xmit batching
> changes would have similar issues.
There were many times while testing that i noticed inconsistencies and
in each case when i analysed[1], i found it to be due to some variable
other than batching which needed some resolving, always via some
parametrization or other. I suspect what KK posted is in the same class.
To give you an example, with UDP, batching was giving worse results at
around 256B compared to 64B or 512B; investigating i found that the
receiver just wasnt able to keep up and the udp layer dropped a lot of
packets so both iperf and netperf reported bad numbers. Fixing the
receiver ended up with consistency coming back. On why 256B was the one
that overwhelmed the receiver more than 64B(which sent more pps)? On
some limited investigation, it seemed to me to be the effect of the
choice of the tg3 driver's default tx mitigation parameters as well tx
ring size; which is something i plan to revisit (but neutralizing it
helps me focus on just batching). In the end i dropped both netperf and
iperf for similar reasons and wrote my own app. What i am trying to
achieve is demonstrate if batching is a GoodThing. In experimentation
like this, it is extremely valuable to reduce the variables. Batching
may expose other orthogonal issues - those need to be resolved or fixed
as they are found. I hope that sounds sensible.
Back to the >=9K packet size you raise above:
I dont have a 10Gige card so iam theorizing. Given that theres an
observed benefit to batching for a saturated link with "smaller" packets
(in my results "small" is anything below 256B which maps to about
380Kpps anything above that seems to approach wire speed and the link is
the bottleneck); then i theorize that 10Gige with 9K jumbo frames if
already achieving wire rate, should continue to do so. And sizes below
that will see improvements if they were not already hitting wire rate.
So i would say that with 10G NICS, there will be more observed
improvements with batching with apps that do bulk transfers (assuming
those apps are not seeing wire speed already). Note that this hasnt been
quiet the case even with TSO given the bottlenecks in the Linux
receivers that J Heffner put nicely in a response to some results you
posted - but that exposes an issue with Linux receivers rather than TSO.
> Also for your xmit batching changes, I think it would be good to see
> performance comparisons for TCP and IP forwarding in addition to your
> UDP pktgen tests,
That is not pktgen - it is a udp app running in process context
utilizing all 4CPUs to send traffic. pktgen bypasses the stack entirely
and has its own merits in proving that batching infact is a GoodThing
even if it is just for traffic generation ;->
> including various packet sizes up to and including
> 9000-byte jumbo frames.
I will do TCP and forwarding tests in the near future.
cheers,
jamal
[1] On average i spend 10x more time performance testing and analysing
results than writting code.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists