lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110518063544.GC2945@elte.hu>
Date:	Wed, 18 May 2011 08:35:44 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Fenghua Yu <fenghua.yu@...el.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	H Peter Anvin <hpa@...or.com>,
	Asit K Mallick <asit.k.mallick@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Avi Kivity <avi@...hat.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andi Kleen <andi@...stfloor.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by enhanced REP
 MOVSB/STOSB


* Fenghua Yu <fenghua.yu@...el.com> wrote:

> From: Fenghua Yu <fenghua.yu@...el.com>
> 
> Support memcpy() with enhanced rep movsb. On processors supporting enhanced 
> rep movsb, the alternative memcpy() function using enhanced rep movsb 
> overrides the original function and the fast string function.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
> ---
>  arch/x86/lib/memcpy_64.S |   45 ++++++++++++++++++++++++++++++++-------------
>  1 files changed, 32 insertions(+), 13 deletions(-)

>  ENDPROC(__memcpy)
>  
>  	/*
> -	 * Some CPUs run faster using the string copy instructions.
> -	 * It is also a lot simpler. Use this when possible:
> -	 */
> -
> -	.section .altinstructions, "a"
> -	.align 8
> -	.quad memcpy
> -	.quad .Lmemcpy_c
> -	.word X86_FEATURE_REP_GOOD
> -
> -	/*
> +	 * Some CPUs are adding enhanced REP MOVSB/STOSB feature
> +	 * If the feature is supported, memcpy_c_e() is the first choice.
> +	 * If enhanced rep movsb copy is not available, use fast string copy
> +	 * memcpy_c() when possible. This is faster and code is simpler than
> +	 * original memcpy().

Please use more obvious names than cryptic and meaningless _c and _c_e 
postfixes. We do not repeat these many times.

Also, did you know about the 'perf bench mem memcpy' tool prototype we have in 
the kernel tree? It is intended to check and evaluate exactly the patches you 
are offering here. The code lives in:

  tools/perf/bench/mem-memcpy-arch.h
  tools/perf/bench/mem-memcpy.c
  tools/perf/bench/mem-memcpy-x86-64-asm-def.h
  tools/perf/bench/mem-memcpy-x86-64-asm.S

Please look into testing (fixing if needed), using and extending it:

 - We want to measure the alternatives variants as well, not just the generic one

 - We want to measure memmove, memclear, etc. operations as well, not just 
   memcpy

 - We want cache-cold and cache-hot numbers as well, going along multiple sizes

This tool can also useful when developing these changes: they can be tested in 
user-space and can be iterated very quickly, without having to build and 
booting the kernel.

We can commit any enhancements/fixes you do to perf bench alongside your memcpy 
patches. All in one, such measurements will make it much easier for us to apply 
the patches.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ