[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201108141313.56926.vda.linux@googlemail.com>
Date: Sun, 14 Aug 2011 13:13:56 +0200
From: Denys Vlasenko <vda.linux@...glemail.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Ingo Molnar <mingo@...e.hu>, melwyn lobo <linux.melwyn@...il.com>,
linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
borislav.petkov@....com
Subject: Re: x86 memcpy performance
On Sunday 14 August 2011 11:59, Borislav Petkov wrote:
> Here's the SSE memcpy version I got so far, I haven't wired in the
> proper CPU feature detection yet because we want to run more benchmarks
> like netperf and stuff to see whether we see any positive results there.
>
> The SYSTEM_RUNNING check is to take care of early boot situations where
> we can't handle FPU exceptions but we use memcpy. There's an aligned and
> misaligned variant which should handle any buffers and sizes although
> I've set the SSE memcpy threshold at 512 Bytes buffersize the least to
> cover context save/restore somewhat.
>
> Comments are much appreciated! :-)
>
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -28,10 +28,20 @@ static __always_inline void *__inline_memcpy(void *to, const void *from, size_t
>
> #define __HAVE_ARCH_MEMCPY 1
> #ifndef CONFIG_KMEMCHECK
> +extern void *__memcpy(void *to, const void *from, size_t len);
> +extern void *__sse_memcpy(void *to, const void *from, size_t len);
> #if (__GNUC__ == 4 && __GNUC_MINOR__ >= 3) || __GNUC__ > 4
> -extern void *memcpy(void *to, const void *from, size_t len);
> +#define memcpy(dst, src, len) \
> +({ \
> + size_t __len = (len); \
> + void *__ret; \
> + if (__len >= 512) \
> + __ret = __sse_memcpy((dst), (src), __len); \
> + else \
> + __ret = __memcpy((dst), (src), __len); \
> + __ret; \
> +})
Please, no. Do not inline every memcpy invocation.
This is pure bloat (comsidering how many memcpy calls there are)
and it doesn't even win anything in speed, since there will be
a fucntion call either way.
Put the __len >= 512 check inside your memcpy instead.
You may do the check if you know that __len is constant:
if (__builtin_constant_p(__len) && __len >= 512) ...
because in this case gcc will evaluate it at compile-time.
--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists