[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 09 Oct 2007 17:50:25 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: andi@...stfloor.org
Cc: hadi@...erus.ca, shemminger@...ux-foundation.org, jeff@...zik.org,
johnpol@....mipt.ru, herbert@...dor.apana.org.au,
gaagaan@...il.com, Robert.Olsson@...a.slu.se,
netdev@...r.kernel.org, rdreier@...co.com,
peter.p.waskiewicz.jr@...el.com, mcarlson@...adcom.com,
jagana@...ibm.com, general@...ts.openfabrics.org,
mchan@...adcom.com, tgraf@...g.ch, randy.dunlap@...cle.com,
sri@...ibm.com, kaber@...sh.net
Subject: Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
From: Andi Kleen <andi@...stfloor.org>
Date: Wed, 10 Oct 2007 02:37:16 +0200
> On Tue, Oct 09, 2007 at 05:04:35PM -0700, David Miller wrote:
> > We have to keep in mind, however, that the sw queue right now is 1000
> > packets. I heavily discourage any driver author to try and use any
> > single TX queue of that size.
>
> Why would you discourage them?
>
> If 1000 is ok for a software queue why would it not be ok
> for a hardware queue?
Because with the software queue, you aren't accessing 1000 slots
shared with the hardware device which does shared-ownership
transactions on those L2 cache lines with the cpu.
Long ago I did a test on gigabit on a cpu with only 256K of
L2 cache. Using a smaller TX queue make things go faster,
and it's exactly because of these L2 cache effects.
> 1000 packets is a lot. I don't have hard data, but gut feeling
> is less would also do.
I'll try to see how backlogged my 10Gb tests get when a strong
sender is sending to a weak receiver.
> And if the hw queues are not enough a better scheme might be to
> just manage this in the sockets in sendmsg. e.g. provide a wait queue that
> drivers can wake up and let them block on more queue.
TCP does this already, but it operates in a lossy manner.
> I don't really see the advantage over the qdisc in that scheme.
> It's certainly not simpler and probably more code and would likely
> also not require less locks (e.g. a currently lockless driver
> would need a new lock for its sw queue). Also it is unclear to me
> it would be really any faster.
You still need a lock to guard hw TX enqueue from hw TX reclaim.
A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you
increase the size much more performance starts to go down due to L2
cache thrashing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists