[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1179267467.4080.33.camel@localhost>
Date: Tue, 15 May 2007 18:17:47 -0400
From: jamal <hadi@...erus.ca>
To: David Miller <davem@...emloft.net>
Cc: xma@...ibm.com, rdreier@...co.com, ak@...e.de, krkumar2@...ibm.com,
netdev@...r.kernel.org, netdev-owner@...r.kernel.org,
ashwin.chaugule@...unite.com,
Evgeniy Polyakov <johnpol@....mipt.ru>,
Gagan Arneja <gagan@...are.com>
Subject: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small
packets xmits
On Tue, 2007-15-05 at 14:32 -0700, David Miller wrote:
> An efficient qdisc-->driver
> transfer during netif_wake_queue() could help solve some of that,
> as is being discussed here.
Ok, heres the approach i discussed at netconf.
It needs net-2.6 and the patch i posted earlier to clean up
qdisc_restart() [1].
I havent ported over all the bits from 2.6.18, but this works.
Krishna and i have colluded privately on working together. I just need
to reproduce the patches, so here is the core.
A lot of the code in the core could be aggragated later - right now i am
worried about correctness.
I will post a patch for tun device in a few minutes
that i use to test on my laptop (i need to remove some debugs) to show
an example.
I also plan to post a patch for e1000 - but that will take more
than a few minutes.
the e1000 driver has changed quiet a bit since 2.6.18, so it is
consuming.
What does a driver need to do to get batched-to?
1) On initialization (probe probably)
a) set NETIF_F_BTX in its dev->features at startup
i.e dev->features |= NETIF_F_BTX
b) initialize the batch queue i.e something like
skb_queue_head_init(&dev->blist);
c) set dev->xmit_win to something reasonable like
maybe half the DMA ring size or tx_queuelen
2) create a new method for batch txmit.
This loops on dev->blist and stashes onto hardware.
All return codes like NETDEV_TX_OK etc still apply.
3) set the dev->xmit_win which provides hints on how much
data to send from the core to the driver. Some suggestions:
a)on doing a netif_stop, set it to 1
b)on netif_wake_queue set it to the max available space
Of course, to work, all this requires that the driver to have a
threshold for waking up tx path; like drivers such as e1000 or tg3 do
in order to invoke netif_wake_queue (example look at TX_WAKE_THRESHOLD
usage in e1000).
feedback welcome (preferably in the form of patches).
Anyone with a really nice tool to measure CPU improvement will help
a great deal in quantifying things. As i have said earlier, I never saw
any throughput improvement. But like T/GSO it may be just CPU savings
(as was suggested at netconf).
cheers,
jamal
[1] http://marc.info/?l=linux-netdev&m=117914954911959&w=2
View attachment "batch0" of type "text/x-patch" (4513 bytes)
Powered by blists - more mailing lists