[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D0F6E557E@AcuExch.aculab.com>
Date: Fri, 21 Mar 2014 14:14:23 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Eric Dumazet' <eric.dumazet@...il.com>,
Andi Kleen <andi@...stfloor.org>,
"H. Peter Anvin" <hpa@...or.com>
CC: Patrick McHardy <kaber@...sh.net>,
Herbert Xu <herbert@...dor.apana.org.au>,
"H.K. Jerry Chu" <hkchu@...gle.com>,
"Michael Dalton" <mwdalton@...gle.com>,
netdev <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC] csum experts, csum_replace2() is too expensive
From: Eric Dumazet
> On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote:
> > Eric Dumazet <eric.dumazet@...il.com> writes:
> > >
> > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that
> > > is insane...
> >
> >
> > Couldn't it just be the cache miss?
>
> Or the fact that we mix 16 bit stores and 32bit loads ?
>
> BTW, any idea why ip_fast_csum() on x86 contains a "memory" constraint ?
The correct constraint would be one that told gcc that it
accesses the 20 bytes from the source pointer.
Without it gcc won't necessarily write out the values before
the asm instructions execute.
David
Powered by blists - more mailing lists