[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110518063544.GC2945@elte.hu>
Date: Wed, 18 May 2011 08:35:44 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Fenghua Yu <fenghua.yu@...el.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
H Peter Anvin <hpa@...or.com>,
Asit K Mallick <asit.k.mallick@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Avi Kivity <avi@...hat.com>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <andi@...stfloor.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by enhanced REP
MOVSB/STOSB
* Fenghua Yu <fenghua.yu@...el.com> wrote:
> From: Fenghua Yu <fenghua.yu@...el.com>
>
> Support memcpy() with enhanced rep movsb. On processors supporting enhanced
> rep movsb, the alternative memcpy() function using enhanced rep movsb
> overrides the original function and the fast string function.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
> ---
> arch/x86/lib/memcpy_64.S | 45 ++++++++++++++++++++++++++++++++-------------
> 1 files changed, 32 insertions(+), 13 deletions(-)
> ENDPROC(__memcpy)
>
> /*
> - * Some CPUs run faster using the string copy instructions.
> - * It is also a lot simpler. Use this when possible:
> - */
> -
> - .section .altinstructions, "a"
> - .align 8
> - .quad memcpy
> - .quad .Lmemcpy_c
> - .word X86_FEATURE_REP_GOOD
> -
> - /*
> + * Some CPUs are adding enhanced REP MOVSB/STOSB feature
> + * If the feature is supported, memcpy_c_e() is the first choice.
> + * If enhanced rep movsb copy is not available, use fast string copy
> + * memcpy_c() when possible. This is faster and code is simpler than
> + * original memcpy().
Please use more obvious names than cryptic and meaningless _c and _c_e
postfixes. We do not repeat these many times.
Also, did you know about the 'perf bench mem memcpy' tool prototype we have in
the kernel tree? It is intended to check and evaluate exactly the patches you
are offering here. The code lives in:
tools/perf/bench/mem-memcpy-arch.h
tools/perf/bench/mem-memcpy.c
tools/perf/bench/mem-memcpy-x86-64-asm-def.h
tools/perf/bench/mem-memcpy-x86-64-asm.S
Please look into testing (fixing if needed), using and extending it:
- We want to measure the alternatives variants as well, not just the generic one
- We want to measure memmove, memclear, etc. operations as well, not just
memcpy
- We want cache-cold and cache-hot numbers as well, going along multiple sizes
This tool can also useful when developing these changes: they can be tested in
user-space and can be iterated very quickly, without having to build and
booting the kernel.
We can commit any enhancements/fixes you do to perf bench alongside your memcpy
patches. All in one, such measurements will make it much easier for us to apply
the patches.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists