[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1383929467.2639.14.camel@joe-AO722>
Date: Fri, 08 Nov 2013 08:51:07 -0800
From: Joe Perches <joe@...ches.com>
To: Neil Horman <nhorman@...driver.com>
Cc: Dave Jones <davej@...hat.com>, linux-kernel@...r.kernel.org,
sebastien.dugue@...l.net, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH v2 2/2] x86: add prefetching to do_csum
On Fri, 2013-11-08 at 11:25 -0500, Neil Horman wrote:
> On Wed, Nov 06, 2013 at 12:07:38PM -0800, Joe Perches wrote:
> > On Wed, 2013-11-06 at 15:02 -0500, Neil Horman wrote:
> > > On Wed, Nov 06, 2013 at 09:19:23AM -0800, Joe Perches wrote:
> > []
> > > > __always_inline instead of inline
> > > > static __always_inline void prefetch_lines(const void *addr, size_t len)
> > > > {
> > > > const void *end = addr + len;
> > > > ...
> > > >
> > > > buff doesn't need a void * cast in prefetch_lines
> > > >
> > > Actually I take back what I said here, we do need the cast, not for a conversion
> > > from unsigned char * to void *, but rather to discard the const qualifier
> > > without making the compiler complain.
> >
> > Not if the function is changed to const void *
> > and end is also const void * as shown.
> >
> Addr is incremented in the for loop, so it can't be const. I could add a loop
> counter variable on the stack, but that doesn't seem like it would help anything
Perhaps you meant
void * const addr;
but that's not what I wrote.
Let me know if this doesn't compile.
It does here...
---
arch/x86/lib/csum-partial_64.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 9845371..891194a 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -29,8 +29,15 @@ static inline unsigned short from32to16(unsigned a)
* Things tried and found to not make it faster:
* Manual Prefetching
* Unrolling to an 128 bytes inner loop.
- * Using interleaving with more registers to break the carry chains.
*/
+
+static __always_inline void prefetch_lines(const void * addr, size_t len)
+{
+ const void *end = addr + len;
+ for (; addr < end; addr += cache_line_size())
+ asm("prefetch 0(%[buf])\n\t" : : [buf] "r" (addr));
+}
+
static unsigned do_csum(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
@@ -67,7 +74,9 @@ static unsigned do_csum(const unsigned char *buff, unsigned len)
/* main loop using 64byte blocks */
zero = 0;
count64 = count >> 3;
- while (count64) {
+
+ prefetch_lines(buff, min(len, cache_line_size() * 4u));
+ while (count64) {
asm("addq 0*8(%[src]),%[res]\n\t"
"adcq 1*8(%[src]),%[res]\n\t"
"adcq 2*8(%[src]),%[res]\n\t"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists