linux-kernel - RE: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy() for unaligned copy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <C10D3FB0CD45994C8A51FEC1227CE22F15D777221E@shsmsx502.ccr.corp.intel.com>
Date:	Mon, 18 Oct 2010 14:43:32 +0800
From:	"Ma, Ling" <ling.ma@...el.com>
To:	"miaox@...fujitsu.com" <miaox@...fujitsu.com>
CC:	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Zhao, Yakui" <yakui.zhao@...el.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()
 for unaligned copy

"wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm"

rep_good will cause memcpy jump to memcpy_c, so not run this patch, 
we may continue to do further optimization on it later.

BTW the improvement is only from core2 shift register optimization,
but for most previous cpus shift register is very sensitive because of decode stage.
I have test Atom, Opteron, and Nocona, new patch is still better.

Thanks
Ling

-----Original Message-----
From: Miao Xie [mailto:miaox@...fujitsu.com] 
Sent: Monday, October 18, 2010 2:35 PM
To: Ma, Ling
Cc: H. Peter Anvin; Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui; Linux Kernel
Subject: Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy() for unaligned copy

On Mon, 18 Oct 2010 14:27:40 +0800, Ma, Ling wrote:
> Could please send out cpu info for this cpu model.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz
stepping	: 6
cpu MHz		: 1603.000
cache size	: 3072 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm
bogomips	: 5319.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Thanks
Miao

>
> Thanks
> Ling
>
> -----Original Message-----
> From: Miao Xie [mailto:miaox@...fujitsu.com]
> Sent: Monday, October 18, 2010 2:24 PM
> To: Ma, Ling
> Cc: H. Peter Anvin; Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui; Linux Kernel
> Subject: Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy() for unaligned copy
>
> On Fri, 15 Oct 2010 03:43:53 +0800, Ma, Ling wrote:
>> Attachment includes memcpy-kernel.c(cc -O2 memcpy-kernel.c -o
>> memcpy-kernel), and unaligned test cases on Atom.
>
> I have tested on my Core2 Duo machine with your benchmark tool. Attachment is the test result. But the result is different with yours on Atom, It seems the performance is better with this patch.
>
> According to these two different result, maybe we need optimize memcpy() by CPU model.
>
> Thanks
> Miao
>
>>
>> Thanks
>> Ling
>>
>> -----Original Message-----
>> From: Ma, Ling
>> Sent: Thursday, October 14, 2010 9:14 AM
>> To: 'H. Peter Anvin'; miaox@...fujitsu.com
>> Cc: Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui; Linux
>> Kernel
>> Subject: RE: [PATCH V2 -tip] lib,x86_64: improve the performance of
>> memcpy() for unaligned copy
>>
>> Sure, I will post benchmark tool and benchmark on Atom 64bit soon.
>>
>> Thanks
>> Ling
>>
>> -----Original Message-----
>> From: H. Peter Anvin [mailto:hpa@...or.com]
>> Sent: Thursday, October 14, 2010 5:32 AM
>> To: miaox@...fujitsu.com
>> Cc: Ma, Ling; Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui;
>> Linux Kernel
>> Subject: Re: [PATCH V2 -tip] lib,x86_64: improve the performance of
>> memcpy() for unaligned copy
>>
>> On 10/08/2010 02:02 AM, Miao Xie wrote:
>>> On Fri, 8 Oct 2010 15:42:45 +0800, Ma, Ling wrote:
>>>> Could you please give us full address for each comparison result,we will do some tests on my machine.
>>>> For unaligned cases older cpus will crossing cache line and slow down caused by load and store, but for nhm, no necessary to care about it.
>>>> By the way in kernel 64bit mode, our access mode should be around 8byte aligned.
>>>
>>> Would you need my benchmark tool? I think it is helpful for your test.
>>>
>>
>> If you could post the benchmark tool that would be great.
>>
>> 	-hpa
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/