linux-kernel - RE: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy() for unaligned copy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <C10D3FB0CD45994C8A51FEC1227CE22F15D77722D0@shsmsx502.ccr.corp.intel.com>
Date:	Mon, 18 Oct 2010 16:01:13 +0800
From:	"Ma, Ling" <ling.ma@...el.com>
To:	"miaox@...fujitsu.com" <miaox@...fujitsu.com>
CC:	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Zhao, Yakui" <yakui.zhao@...el.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()
 for unaligned copy




>> rep_good will cause memcpy jump to memcpy_c, so not run this patch,
>> we may continue to do further optimization on it later.

>Yes, but in fact, the performance of memcpy_c is not better on some micro-architecture(such as:
>Wolfdale-3M, ), especially in the unaligned cases, so we need do optimization for it, and I think
>the first step of optimization is optimizing the original code of memcpy().

As mentioned above , we will optimize further memcpy_c soon.
Two reasons : 
  1. movs instruction need long lantency to startup
  2. movs instruction is not good for unaligned case.
  
>> BTW the improvement is only from core2 shift register optimization,
>> but for most previous cpus shift register is very sensitive because of decode stage. 
>> I have test Atom, Opteron, and Nocona, new patch is still better.

>I think we can add a flag to make this improvement only valid for Core2 or other CPU like it,
>just like X86_FEATURE_REP_GOOD.

We should optimize core2 in memcpy_c function in future, I think.

Thanks
Ling


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/