[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210312205914.GG22098@zn.tnic>
Date: Fri, 12 Mar 2021 21:59:14 +0100
From: Borislav Petkov <bp@...en8.de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: x86@...nel.org, rostedt@...dmis.org, hpa@...or.com,
torvalds@...uxfoundation.org, linux-kernel@...r.kernel.org,
linux-toolchains@...r.kernel.org, jpoimboe@...hat.com,
alexei.starovoitov@...il.com, mhiramat@...nel.org
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]
On Fri, Mar 12, 2021 at 12:32:53PM +0100, Peter Zijlstra wrote:
> Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> simply irrelevant today, remove variable NOPs and use NOPL.
Just ran them on my SNB box:
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
stepping : 7
with the usual perf stat kernel build workload with
CONFIG_DYNAMIC_FTRACE and CONFIG_FUNCTION_TRACER where each function has
a NOP at its beginning when ftrace is disabled (thx Steve).
./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 bzImage
before: tip-master
Performance counter stats for 'make -s -j9 bzImage' (5 runs):
3,213,728.10 msec task-clock # 7.307 CPUs utilized ( +- 0.01% )
339,270 context-switches # 0.106 K/sec ( +- 0.09% )
31,472 cpu-migrations # 0.010 K/sec ( +- 0.64% )
62,070,684 page-faults # 0.019 M/sec ( +- 0.01% )
11,498,198,009,323 cycles # 3.578 GHz ( +- 0.01% ) (83.33%)
8,235,957,366,696 stalled-cycles-frontend # 71.63% frontend cycles idle ( +- 0.01% ) (83.33%)
5,976,456,688,814 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.02% ) (66.67%)
7,553,156,344,376 instructions # 0.66 insn per cycle
# 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
1,635,468,917,524 branches # 508.901 M/sec ( +- 0.00% ) (83.34%)
51,888,292,932 branch-misses # 3.17% of all branches ( +- 0.02% ) (83.33%)
439.809 +- 0.156 seconds time elapsed ( +- 0.04% )
after: tip-master-nops
Performance counter stats for 'make -s -j9 bzImage' (5 runs):
3,217,113.67 msec task-clock # 7.307 CPUs utilized ( +- 0.03% )
339,425 context-switches # 0.106 K/sec ( +- 0.20% )
31,724 cpu-migrations # 0.010 K/sec ( +- 0.54% )
62,027,130 page-faults # 0.019 M/sec ( +- 0.01% )
11,508,779,965,901 cycles # 3.577 GHz ( +- 0.03% ) (83.34%)
8,241,212,210,440 stalled-cycles-frontend # 71.61% frontend cycles idle ( +- 0.04% ) (83.33%)
5,982,615,533,177 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.06% ) (66.66%)
7,546,407,430,314 instructions # 0.66 insn per cycle
# 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
1,634,187,006,479 branches # 507.967 M/sec ( +- 0.00% ) (83.33%)
51,941,580,371 branch-misses # 3.18% of all branches ( +- 0.01% ) (83.33%)
440.266 +- 0.195 seconds time elapsed ( +- 0.04% )
So here's numbers talk, bullshit walks. And with those numbers no
bullshit can remain lingering around anyway.
Cheers!
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists