[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Oct 2007 02:25:50 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: andi@...stfloor.org
Cc: hadi@...erus.ca, shemminger@...ux-foundation.org, jeff@...zik.org,
johnpol@....mipt.ru, herbert@...dor.apana.org.au,
gaagaan@...il.com, Robert.Olsson@...a.slu.se,
netdev@...r.kernel.org, rdreier@...co.com,
peter.p.waskiewicz.jr@...el.com, mcarlson@...adcom.com,
jagana@...ibm.com, general@...ts.openfabrics.org,
mchan@...adcom.com, tgraf@...g.ch, randy.dunlap@...cle.com,
sri@...ibm.com, kaber@...sh.net
Subject: Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
From: Andi Kleen <andi@...stfloor.org>
Date: Wed, 10 Oct 2007 11:16:44 +0200
> > A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you
>
> With TSO really?
Yes.
> > increase the size much more performance starts to go down due to L2
> > cache thrashing.
>
> Another possibility would be to consider using cache avoidance
> instructions while updating the TX ring (e.g. write combining
> on x86)
The chip I was working with at the time (UltraSPARC-IIi) compressed
all the linear stores into 64-byte full cacheline transactions via
the store buffer.
It's true that it would allocate in the L2 cache on a miss, which
is different from your suggestion.
In fact, such a thing might not pan out well, because most of the time
you write a single descriptor or two, and that isn't a full cacheline,
which means a read/modify/write is the only coherent way to make such
a write to RAM.
Sure you could batch, but I'd rather give the chip work to do unless
I unequivocably knew I'd have enough pending to fill a cacheline's
worth of descriptors. And since you suggest we shouldn't queue in
software... :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists