lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 21 Dec 2010 00:34:39 +0900 From: Hitoshi Mitake <mitake@....info.waseda.ac.jp> To: miaox@...fujitsu.com Cc: Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <a.p.zijlstra@...llo.nl>, linux-kernel@...r.kernel.org, ling.ma@...el.com, Zhao Yakui <yakui.zhao@...el.com>, Arnaldo Carvalho de Melo <acme@...hat.com>, Paul Mackerras <paulus@...ba.org>, Frederic Weisbecker <fweisbec@...il.com>, Steven Rostedt <rostedt@...dmis.org>, Thomas Gleixner <tglx@...utronix.de>, "H. Peter Anvin" <hpa@...or.com> Subject: Re: [PATCH 1/2] perf bench: port memcpy_64.S to perf bench On Mon, Dec 20, 2010 at 15:30, Miao Xie <miaox@...fujitsu.com> wrote: > On Sun, 19 Dec 2010 01:25:00 +0900, Hitoshi Mitake wrote: >> >> On 2010年10月31日 04:21, Ingo Molnar wrote: >>> >>> * Peter Zijlstra<a.p.zijlstra@...llo.nl> wrote: >>> >>>> On Sat, 2010-10-30 at 01:01 +0900, Hitoshi Mitake wrote: >>>>> >>>>> This patch ports arch/x86/lib/memcpy_64.S to "perf bench mem". >>>>> When PERF_BENCH is defined at preprocessor level, >>>>> memcpy_64.S is preprocessed to includable form from the sources >>>>> under tools/perf for benchmarking programs. >>>>> >>>>> Signed-off-by: Hitoshi Mitake<mitake@....info.waseda.ac.jp> >>>>> Cc: Ma Ling:<ling.ma@...el.com> >>>>> Cc: Zhao Yakui<yakui.zhao@...el.com> >>>>> Cc: Peter Zijlstra<a.p.zijlstra@...llo.nl> >>>>> Cc: Arnaldo Carvalho de Melo<acme@...hat.com> >>>>> Cc: Paul Mackerras<paulus@...ba.org> >>>>> Cc: Frederic Weisbecker<fweisbec@...il.com> >>>>> Cc: Steven Rostedt<rostedt@...dmis.org> >>>>> Cc: Thomas Gleixner<tglx@...utronix.de> >>>>> Cc: H. Peter Anvin<hpa@...or.com> >>>>> --- >>>>> arch/x86/lib/memcpy_64.S | 30 ++++++++++++++++++++++++++++++ >>>>> 1 files changed, 30 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S >>>>> index 75ef61e..72c6dfe 100644 >>>>> --- a/arch/x86/lib/memcpy_64.S >>>>> +++ b/arch/x86/lib/memcpy_64.S >>>>> @@ -1,10 +1,23 @@ >>>>> /* Copyright 2002 Andi Kleen */ >>>>> >>>>> +/* >>>>> + * perf bench adoption by Hitoshi Mitake >>>>> + * PERF_BENCH means that this file is included from >>>>> + * the source files under tools/perf/ for benchmark programs. >>>>> + * >>>>> + * You don't have to care about PERF_BENCH when >>>>> + * you are working on the kernel. >>>>> + */ >>>>> + >>>>> +#ifndef PERF_BENCH >>>> >>>> I don't like littering the actual kernel code with tools/perf/ >>>> ifdeffery.. >>> >>> >>> Yeah - could we somehow accept that file into a perf build as-is? >>> >>> Thanks, >>> >>> Ingo >>> >> >> Really sorry for my slow work... >> >> BTW, I have a question for Miao and Ingo. >> We are planning to implement new memcpy() of Miao, >> and the important point is not removing previous memcpy() >> for future architectures and benchmarkings. >> >> I feel that adding new CPU feature flag (like X86_FEATURE_REP_GOOD) >> and switching memcpy() with alternative mechanism is good way. >> (So we will have three memcpy()s: rep based, unrolled, and new >> unaligned oriented one) >> But there is another way: #ifdef. Which do you prefer? > > I agree with your idea, but Ma Ling said this way may cause the i-cache > miss problem. > http://marc.info/?l=linux-kernel&m=128746120107953&w=2 > (The size of the i-cache is 32K, the size of memcpy() in my patch is > 560Byte, > and the size of the last version in tip tree is 400Byte). > > But I have not tested it, so I don't know the real result. Maybe we should > try to implement the new memcpy() first. I compared memcpy()'s icache miss behaviour with my new --wait-on patch ( https://patchwork.kernel.org/patch/408801/ ). And the result is, default of tip tree % sudo ./perf stat -w /tmp/perf-stat-wait -e L1-icache-load-misses Performance counter stats for process id '12559': 64,328 L1-icache-load-misses 0.106513157 seconds time elapsed Miao Xie's memcpy() % sudo ./perf stat -w /tmp/perf-stat-wait -e L1-icache-misses Performance counter stats for process id '13159': 64,559 L1-icache-load-misses 0.107057925 seconds time elapsed It seems that there is no fatal icache miss. # I tested perf bench mem memcpy with Core i3 M 330 processor. But I don't understand well about cache characteristics of intel processor. I have to look at this problem more deeply. > >> And could you tell me the detail of CPU family information >> you are targeting, Miao? > > They are Core2 Duo E7300(Core name: Wolfdale) and Xeon X5260(Core name: > Wolfdale-DP). > > The following is the detailed information of these two CPU: > Core2 Duo E7300: > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz > stepping : 6 > cpu MHz : 1603.000 > cache size : 3072 KB > physical id : 0 > siblings : 2 > core id : 1 > cpu cores : 2 > apicid : 1 > initial apicid : 1 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm > constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor > ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dts > bogomips : 5319.70 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > Xeon X5260: > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Xeon(R) CPU X5260 @ 3.33GHz > stepping : 6 > cpu MHz : 1999.000 > cache size : 6144 KB > physical id : 3 > siblings : 2 > core id : 1 > cpu cores : 2 > apicid : 7 > initial apicid : 7 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm > constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor > ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow > vnmi flexpriority > bogomips : 6649.07 > clflush size : 64 > cache_alignment : 64 > address sizes : 38 bits physical, 48 bits virtual > power management: > Thanks for your information! Thanks, Hitoshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists