linux-kernel - Re: [PATCH 0/5] ftrace: to kill a daemon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080813063126.GA12335@Krystal>
Date:	Wed, 13 Aug 2008 02:31:26 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Andi Kleen <andi@...stfloor.org>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Miller <davem@...emloft.net>,
	Roland McGrath <roland@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
	Clark Williams <williams@...hat.com>
Subject: Re: [PATCH 0/5] ftrace: to kill a daemon

* Steven Rostedt (rostedt@...dmis.org) wrote:
> 
> On Fri, 8 Aug 2008, Linus Torvalds wrote:
> > 
> > 
> > On Fri, 8 Aug 2008, Jeremy Fitzhardinge wrote:
> > >
> > > Steven Rostedt wrote:
> > > > I wish we had a true 5 byte nop. 
> > > 
> > > 0x66 0x66 0x66 0x66 0x90
> > 
> > I don't think so. Multiple redundant prefixes can be really expensive on 
> > some uarchs.
> > 
> > A no-op that isn't cheap isn't a no-op at all, it's a slow-op.
> 
> 
> A quick meaningless benchmark showed a slight perfomance hit.
> 

Hi Steven,

I tried to run my own tests to see if I could get to know if these
numbers are actually meaningful at all. My results seems to show that
there is not any significant difference between the various
configurations, and actually that the only one tendency I see is that
the 2-bytes jump offset 0x03 would be slightly faster than the 3/2 nop
on Intel Xeon. But we would have to run these a bit more often to
confirm that I guess.

I am just trying to get a sense of whether we are really trying hard to
optimize something worthless in practice, and to me it looks like it.
But it could be the architecture I am using that brings these results.

Mathieu

Intel Xeon dual quad-core
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz

3/2 nop used :
K8_NOP3 K8_NOP2
#define K8_NOP2 ".byte 0x66,0x90\n"
#define K8_NOP3 ".byte 0x66,0x66,0x90\n"

** Summary **

Test A : make -j20 2.6.27-rc2 kernel (real time)
                                          Avg.      std.dev
Case 1 : ftrace not compiled-in.          1m9.76s   0.41s
Case 2 : 3/2 nops                         1m9.95s   0.36s
Case 3 : 2-bytes jump, offset 0x03        1m9.10s   0.40s
Case 4 : 5-bytes jump, offset 0x00        1m9.25s   0.34s

Test B : hackbench 15

Case 1 : ftrace not compiled-in.          0.349s    0.007s
Case 2 : 3/2 nops                         0.351s    0.014s
Case 3 : 2-bytes jump, offset 0x03        0.350s    0.007s
Case 4 : 5-bytes jump, offset 0x00        0.351s    0.010s



** Detail **

* Test A

benchmark : make -j20 2.6.27-rc2 kernel
make clean; make -j20; make clean done before the tests to prime caches.
Same .config used.


Case 1 : ftrace not compiled-in.

real	1m9.980s
user	7m27.664s
sys	0m48.771s

real	1m9.330s
user	7m27.244s
sys	0m50.567s

real	1m9.393s
user	7m27.408s
sys	0m50.511s

real	1m9.674s
user	7m28.088s
sys	0m50.327s

real	1m10.441s
user	7m27.736s
sys	0m49.687s

real time
average : 1m9.76s
std. dev. : 0.41s

after a reboot with the same kernel :

real	1m8.758s
user	7m26.012s
sys	0m48.835s

real	1m11.035s
user	7m26.432s
sys	0m49.171s

real	1m9.834s
user	7m25.768s
sys	0m49.167s


Case 2 : 3/2 nops

real	1m9.713s
user	7m27.524s
sys	0m48.315s

real	1m9.481s
user	7m27.144s
sys	0m48.587s

real	1m10.565s
user	7m27.048s
sys	0m48.715s

real	1m10.008s
user	7m26.436s
sys	0m49.295s

real	1m9.982s
user	7m27.160s
sys	0m48.667s

real time
avg : 1m9.95s
std. dev. : 0.36s


Case 3 : 2-bytes jump, offset 0x03

real	1m9.158s
user	7m27.108s
sys	0m48.775s

real	1m9.159s
user	7m27.320s
sys	0m48.659s

real	1m8.390s
user	7m27.976s
sys	0m48.359s

real	1m9.143s
user	7m26.624s
sys	0m48.719s

real	1m9.642s
user	7m26.228s
sys	0m49.483s

real time
avg : 1m9.10s
std. dev. : 0.40s

one extra after reboot with same kernel :

real	1m8.855s
user	7m27.372s
sys	0m48.543s


Case 4 : 5-bytes jump, offset 0x00

real	1m9.173s
user	7m27.228s
sys	0m48.151s

real	1m9.735s
user	7m26.852s
sys	0m48.499s

real	1m9.502s
user	7m27.148s
sys	0m48.107s

real	1m8.727s
user	7m27.416s
sys	0m48.071s

real	1m9.115s
user	7m26.932s
sys	0m48.727s

real time
avg : 1m9.25s
std. dev. : 0.34s


* Test B

Hackbench

Case 1 : ftrace not compiled-in.

./hackbench 15
Time: 0.358
./hackbench 15
Time: 0.342
./hackbench 15
Time: 0.354
./hackbench 15
Time: 0.338
./hackbench 15
Time: 0.347

Average : 0.349
std. dev. : 0.007

Case 2 : 3/2 nops

./hackbench 15
Time: 0.328
./hackbench 15
Time: 0.368
./hackbench 15
Time: 0.351
./hackbench 15
Time: 0.343
./hackbench 15
Time: 0.366

Average : 0.351
std. dev. : 0.014

Case 3 : jmp 2 bytes

./hackbench 15
Time: 0.346
./hackbench 15
Time: 0.359
./hackbench 15
Time: 0.356
./hackbench 15
Time: 0.350
./hackbench 15
Time: 0.340

Average : 0.350
std. dev. : 0.007

Case 3 : jmp 5 bytes

./hackbench 15
Time: 0.346
./hackbench 15
Time: 0.346
./hackbench 15
Time: 0.364
./hackbench 15
Time: 0.362
./hackbench 15
Time: 0.338

Average : 0.351
std. dev. : 0.010


Hardware used :

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping	: 6
cpu MHz		: 2000.114
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx tm2 ssse3
cx16 xtpr dca sse4_1 lahf_lm
bogomips	: 4000.22
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

(7 other similar cpus)


> Here's 10 runs of "hackbench 50" using the two part 5 byte nop:
> 
> run 1
> Time: 4.501
> run 2
> Time: 4.855
> run 3
> Time: 4.198
> run 4
> Time: 4.587
> run 5
> Time: 5.016
> run 6
> Time: 4.757
> run 7
> Time: 4.477
> run 8
> Time: 4.693
> run 9
> Time: 4.710
> run 10
> Time: 4.715
> avg = 4.6509
> 
> 
> And 10 runs using the above 5 byte nop:
> 
> run 1
> Time: 4.832
> run 2
> Time: 5.319
> run 3
> Time: 5.213
> run 4
> Time: 4.830
> run 5
> Time: 4.363
> run 6
> Time: 4.391
> run 7
> Time: 4.772
> run 8
> Time: 4.992
> run 9
> Time: 4.727
> run 10
> Time: 4.825
> avg = 4.8264
> 
> # cat /proc/cpuinfo
> processor	: 0
> vendor_id	: AuthenticAMD
> cpu family	: 15
> model		: 65
> model name	: Dual-Core AMD Opteron(tm) Processor 2220
> stepping	: 3
> cpu MHz		: 2799.992
> cache size	: 1024 KB
> physical id	: 0
> siblings	: 2
> core id		: 0
> cpu cores	: 2
> apicid		: 0
> initial apicid	: 0
> fdiv_bug	: no
> hlt_bug		: no
> f00f_bug	: no
> coma_bug	: no
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 1
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic 
> cr8_legacy
> bogomips	: 5599.98
> clflush size	: 64
> power management: ts fid vid ttp tm stc
> 
> There's 4 of these.
> 
> Just to make sure, I ran the above nop test again:
> 
> [ this is reverse from the above runs ]
> 
> run 1
> Time: 4.723
> run 2
> Time: 5.080
> run 3
> Time: 4.521
> run 4
> Time: 4.841
> run 5
> Time: 4.696
> run 6
> Time: 4.946
> run 7
> Time: 4.754
> run 8
> Time: 4.717
> run 9
> Time: 4.905
> run 10
> Time: 4.814
> avg = 4.7997
> 
> And again the two part nop:
> 
> run 1
> Time: 4.434
> run 2
> Time: 4.496
> run 3
> Time: 4.801
> run 4
> Time: 4.714
> run 5
> Time: 4.631
> run 6
> Time: 5.178
> run 7
> Time: 4.728
> run 8
> Time: 4.920
> run 9
> Time: 4.898
> run 10
> Time: 4.770
> avg = 4.757
> 
> 
> This time it was close, but still seems to have some difference.
> 
> heh, perhaps it's just noise.
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/