[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1383337612.3042.21.camel@joe-AO722>
Date: Fri, 01 Nov 2013 13:26:52 -0700
From: Joe Perches <joe@...ches.com>
To: Neil Horman <nhorman@...driver.com>
Cc: David Laight <David.Laight@...LAB.COM>,
Ben Hutchings <bhutchings@...arflare.com>,
Doug Ledford <dledford@...hat.com>,
Ingo Molnar <mingo@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
On Fri, 2013-11-01 at 15:58 -0400, Neil Horman wrote:
> On Fri, Nov 01, 2013 at 12:45:29PM -0700, Joe Perches wrote:
> > On Fri, 2013-11-01 at 13:37 -0400, Neil Horman wrote:
> >
> > > I think it would be better if we just did the prefetch here
> > > and re-addressed this area when AVX (or addcx/addox) instructions were available
> > > for testing on hardware.
> >
> > Could there be a difference if only a single software
> > prefetch was done at the beginning of transfer before
> > the while loop and hardware prefetches did the rest?
> >
> I wouldn't think so. If hardware was going to do any prefetching based on
> memory access patterns it will do so regardless of the leading prefetch, and
> that first prefetch isn't helpful because we still wind up stalling on the adds
> while its completing
I imagine one benefit to be helping prevent
prefetching beyond the actual data required.
Maybe some hardware optimizes prefetch stride
better than 5*64.
I wonder also if using
if (count > some_length)
prefetch
while (...)
helps small lengths more than the test/jump cost.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists