lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 21 Dec 2010 00:34:39 +0900
From:	Hitoshi Mitake <mitake@....info.waseda.ac.jp>
To:	miaox@...fujitsu.com
Cc:	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel@...r.kernel.org, ling.ma@...el.com,
	Zhao Yakui <yakui.zhao@...el.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Paul Mackerras <paulus@...ba.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 1/2] perf bench: port memcpy_64.S to perf bench

On Mon, Dec 20, 2010 at 15:30, Miao Xie <miaox@...fujitsu.com> wrote:
> On Sun, 19 Dec 2010 01:25:00 +0900, Hitoshi Mitake wrote:
>>
>> On 2010年10月31日 04:21, Ingo Molnar wrote:
>>>
>>> * Peter Zijlstra<a.p.zijlstra@...llo.nl> wrote:
>>>
>>>> On Sat, 2010-10-30 at 01:01 +0900, Hitoshi Mitake wrote:
>>>>>
>>>>> This patch ports arch/x86/lib/memcpy_64.S to "perf bench mem".
>>>>> When PERF_BENCH is defined at preprocessor level,
>>>>> memcpy_64.S is preprocessed to includable form from the sources
>>>>> under tools/perf for benchmarking programs.
>>>>>
>>>>> Signed-off-by: Hitoshi Mitake<mitake@....info.waseda.ac.jp>
>>>>> Cc: Ma Ling:<ling.ma@...el.com>
>>>>> Cc: Zhao Yakui<yakui.zhao@...el.com>
>>>>> Cc: Peter Zijlstra<a.p.zijlstra@...llo.nl>
>>>>> Cc: Arnaldo Carvalho de Melo<acme@...hat.com>
>>>>> Cc: Paul Mackerras<paulus@...ba.org>
>>>>> Cc: Frederic Weisbecker<fweisbec@...il.com>
>>>>> Cc: Steven Rostedt<rostedt@...dmis.org>
>>>>> Cc: Thomas Gleixner<tglx@...utronix.de>
>>>>> Cc: H. Peter Anvin<hpa@...or.com>
>>>>> ---
>>>>> arch/x86/lib/memcpy_64.S | 30 ++++++++++++++++++++++++++++++
>>>>> 1 files changed, 30 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
>>>>> index 75ef61e..72c6dfe 100644
>>>>> --- a/arch/x86/lib/memcpy_64.S
>>>>> +++ b/arch/x86/lib/memcpy_64.S
>>>>> @@ -1,10 +1,23 @@
>>>>> /* Copyright 2002 Andi Kleen */
>>>>>
>>>>> +/*
>>>>> + * perf bench adoption by Hitoshi Mitake
>>>>> + * PERF_BENCH means that this file is included from
>>>>> + * the source files under tools/perf/ for benchmark programs.
>>>>> + *
>>>>> + * You don't have to care about PERF_BENCH when
>>>>> + * you are working on the kernel.
>>>>> + */
>>>>> +
>>>>> +#ifndef PERF_BENCH
>>>>
>>>> I don't like littering the actual kernel code with tools/perf/
>>>> ifdeffery..
>>>
>>>
>>> Yeah - could we somehow accept that file into a perf build as-is?
>>>
>>> Thanks,
>>>
>>> Ingo
>>>
>>
>> Really sorry for my slow work...
>>
>> BTW, I have a question for Miao and Ingo.
>> We are planning to implement new memcpy() of Miao,
>> and the important point is not removing previous memcpy()
>> for future architectures and benchmarkings.
>>
>> I feel that adding new CPU feature flag (like X86_FEATURE_REP_GOOD)
>> and switching memcpy() with alternative mechanism is good way.
>> (So we will have three memcpy()s: rep based, unrolled, and new
>> unaligned oriented one)
>> But there is another way: #ifdef. Which do you prefer?
>
> I agree with your idea, but Ma Ling said this way may cause the i-cache
> miss problem.
>  http://marc.info/?l=linux-kernel&m=128746120107953&w=2
> (The size of the i-cache is 32K, the size of memcpy() in my patch is
> 560Byte,
> and the size of the last version in tip tree is 400Byte).
>
> But I have not tested it, so I don't know the real result. Maybe we should
> try to implement the new memcpy() first.

I compared memcpy()'s icache miss behaviour with my new
--wait-on patch ( https://patchwork.kernel.org/patch/408801/ ).
And the result is,

default of tip tree

% sudo ./perf stat -w /tmp/perf-stat-wait -e L1-icache-load-misses

 Performance counter stats for process id '12559':

            64,328 L1-icache-load-misses

        0.106513157  seconds time elapsed

Miao Xie's memcpy()

% sudo ./perf stat -w /tmp/perf-stat-wait -e L1-icache-misses

 Performance counter stats for process id '13159':

            64,559 L1-icache-load-misses

        0.107057925  seconds time elapsed

It seems that there is no fatal icache miss.
# I tested perf bench mem memcpy with Core i3 M 330 processor.

But I don't understand well about cache characteristics of intel processor.
I have to look at this problem more deeply.

>
>> And could you tell me the detail of CPU family information
>> you are targeting, Miao?
>
> They are  Core2 Duo E7300(Core name: Wolfdale) and Xeon X5260(Core name:
> Wolfdale-DP).
>
> The following is the detailed information of these two CPU:
> Core2 Duo E7300:
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 23
> model name      : Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz
> stepping        : 6
> cpu MHz         : 1603.000
> cache size      : 3072 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> apicid          : 1
> initial apicid  : 1
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor
> ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dts
> bogomips        : 5319.70
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
> Xeon X5260:
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 23
> model name      : Intel(R) Xeon(R) CPU           X5260  @ 3.33GHz
> stepping        : 6
> cpu MHz         : 1999.000
> cache size      : 6144 KB
> physical id     : 3
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> apicid          : 7
> initial apicid  : 7
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm
> constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor
> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow
> vnmi flexpriority
> bogomips        : 6649.07
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 38 bits physical, 48 bits virtual
> power management:
>

Thanks for your information!

Thanks,
        Hitoshi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists