lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 06 Nov 2009 09:07:44 -0800
From:	"H. Peter Anvin" <hpa@...or.com>
To:	ling.ma@...el.com
CC:	mingo@...e.hu, tglx@...utronix.de, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
 fast string.

On 11/06/2009 01:41 AM, ling.ma@...el.com wrote:
> 
>  Performance counter stats for './static_orig' (10 runs):
> 
>     2835.650105  task-clock-msecs         #      0.999 CPUs    ( +-   0.051% )
>               3  context-switches         #      0.000 M/sec   ( +-   6.503% )
>               0  CPU-migrations           #      0.000 M/sec   ( +-     nan% )
>            4429  page-faults              #      0.002 M/sec   ( +-   0.003% )
>      7941098692  cycles                   #   2800.451 M/sec   ( +-   0.051% )
>     10848100323  instructions             #      1.366 IPC     ( +-   0.000% )
>          322808  cache-references         #      0.114 M/sec   ( +-   1.467% )
>          280716  cache-misses             #      0.099 M/sec   ( +-   0.618% )
> 
>     2.838006377  seconds time elapsed   ( +-   0.051% )
> 
> 'perf stat --repeat 10 ./static_new' command get data after patch:
> 
>  Performance counter stats for './static_new' (10 runs):
> 
>     7401.423466  task-clock-msecs         #      0.999 CPUs    ( +-   0.108% )
>              10  context-switches         #      0.000 M/sec   ( +-   2.797% )
>               0  CPU-migrations           #      0.000 M/sec   ( +-     nan% )
>            4428  page-faults              #      0.001 M/sec   ( +-   0.003% )
>     20727280183  cycles                   #   2800.445 M/sec   ( +-   0.107% )
>      1472673654  instructions             #      0.071 IPC     ( +-   0.013% )
>         1092221  cache-references         #      0.148 M/sec   ( +-  12.414% )
>          290550  cache-misses             #      0.039 M/sec   ( +-   1.577% )
> 
>     7.407006046  seconds time elapsed   ( +-   0.108% )
> 

I assume these are backwards?  If so, it's a dramatic performance
improvement.

Where did the 1024 byte threshold come from?  It seems a bit high to me,
and is at the very best a CPU-specific tuning factor.

Andi is of course correct that older CPUs might suffer (sadly enough),
which is why we'd at the very least need some idea of what the
performance impact on those older CPUs would look like -- at that point
we can make a decision to just unconditionally do the rep movs or
consider some system where we point at different implementations for
different processors -- memcpy is probably one of the very few
operations for which something like that would make sense.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ