[<prev] [next>] [day] [month] [year] [list]
Message-ID: <8FED46E8A9CA574792FC7AACAC38FE7714FCDE2430@PDSMSX501.ccr.corp.intel.com>
Date: Wed, 28 Oct 2009 14:09:26 +0800
From: "Ma, Ling" <ling.ma@...el.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: FW: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S
by avoid memory miss predication.
Hi Ingo
There are another test cases we need to do or comments?
Best Regards
Ma Ling
________________________________________
From: Ma, Ling
Sent: 2009年10月26日 16:26
To: 'mingo@...e.hu'
Cc: 'hpa@...or.com'; 'tglx@...utronix.de'; 'linux-kernel@...r.kernel.org'
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by avoid memory miss predication.
We generate new report for another case when src offset is 0x45010, dst is 0x34020.
by 'perf stat --repeat 10 ./static_rsi_45010_rdi_34020_old/new' .
The test program I wrote:
for (i = 64; i < 4096 *4; i ++)
do_memcpy(src, dst, i);
Before the patch:
Performance counter stats for './static_rsi_45010_rdi_34020_old' (10 runs):
54014.766012 task-clock-msecs # 0.999 CPUs ( +- 0.016% )
80 context-switches # 0.000 M/sec ( +- 7.894% )
0 CPU-migrations # 0.000 M/sec ( +- 66.667% )
4429 page-faults # 0.000 M/sec ( +- 0.002% )
136855571663 cycles # 2533.670 M/sec ( +- 0.016% )
44524796868 instructions # 0.325 IPC ( +- 0.008% )
771000 cache-references # 0.014 M/sec ( +- 10.397% )
541785 cache-misses # 0.010 M/sec ( +- 4.203% )
54.062799203 seconds time elapsed ( +- 0.021% )
After the patch
Performance counter stats for './static_rsi_45010_rdi_34020_new' (10 runs):
7570.357661 task-clock-msecs # 0.999 CPUs ( +- 0.350% )
13 context-switches # 0.000 M/sec ( +- 9.320% )
0 CPU-migrations # 0.000 M/sec ( +- nan% )
4429 page-faults # 0.001 M/sec ( +- 0.004% )
19180782064 cycles # 2533.669 M/sec ( +- 0.349% )
44462001104 instructions # 2.318 IPC ( +- 0.001% )
383673 cache-references # 0.051 M/sec ( +- 4.112% )
317436 cache-misses # 0.042 M/sec ( +- 1.607% )
7.581541785 seconds time elapsed ( +- 0.343% )
The patch got performance improvement 54.062799203/ 7.581541785 = 7.13x.
If you need any other test reports, please let me know
Thanks
Ma Ling
Powered by blists - more mailing lists