[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1189998103.4230.76.camel@localhost>
Date: Sun, 16 Sep 2007 23:01:43 -0400
From: jamal <hadi@...erus.ca>
To: David Miller <davem@...emloft.net>
Cc: krkumar2@...ibm.com, johnpol@....mipt.ru,
herbert@...dor.apana.org.au, kaber@...sh.net,
shemminger@...ux-foundation.org, jagana@...ibm.com,
Robert.Olsson@...a.slu.se, rick.jones2@...com, xma@...ibm.com,
gaagaan@...il.com, netdev@...r.kernel.org, rdreier@...co.com,
peter.p.waskiewicz.jr@...el.com, mcarlson@...adcom.com,
jeff@...zik.org, mchan@...adcom.com, general@...ts.openfabrics.org,
kumarkr@...ux.ibm.com, tgraf@...g.ch, randy.dunlap@...cle.com,
sri@...ibm.com
Subject: Re: [PATCH 0/10 REV5] Implement skb batching and support in
IPoIB/E1000
On Sun, 2007-16-09 at 19:25 -0700, David Miller wrote:
> There are tertiary issues I'm personally interested in, for example
> how well this stuff works when we enable software GSO on a non-TSO
> capable card.
>
> In such a case the GSO segment should be split right before we hit the
> driver and then all the sub-segments of the original GSO frame batched
> in one shot down to the device driver.
I think GSO is still useful on top of this.
In my patches anything with gso gets put into the batch list and shot
down the driver. Ive never considered checking whether the nic is TSO
capable, that may be something worth checking into. The netiron allows
you to shove upto 128 skbs utilizing one tx descriptor, which makes for
interesting possibilities.
> In this way you'll get a large chunk of the benefit of TSO without
> explicit hardware support for the feature.
>
> There are several cards (some even 10GB) that will benefit immensely
> from this.
indeed - ive always wondered if batching this way would make the NICs
behave differently from the way TSO does.
On a side note: My observation is that with large packets on a very busy
system; bulk transfer type app, one approaches wire speed; with or
without batching, the apps are mostly idling (Ive seen upto 90% idle
time polling at the socket level for write to complete with a really
busy system). This is the case with or without batching. cpu seems a
little better with batching. As the aggregation of the apps gets more
aggressive (achievable by reducing their packet sizes), one can achieve
improved throughput and reduced cpu utilization. This all with UDP; i am
still studying tcp.
cheers,
jamal
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists