[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250314132306.GDZ9QtukcVVtDmW1V1@fat_crate.local>
Date: Fri, 14 Mar 2025 14:23:06 +0100
From: Borislav Petkov <bp@...en8.de>
To: Ingo Molnar <mingo@...nel.org>
Cc: Uros Bizjak <ubizjak@...il.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] x86/asm: Use asm_inline() instead of asm() in
amd_clear_divider()
On Fri, Mar 14, 2025 at 11:17:43AM +0100, Ingo Molnar wrote:
> Here's a link for those who'd like to view this via the web:
>
> https://lore.kernel.org/all/174188884263.14745.1542926632284353047.tip-bot2@tip-bot2/
This is a perf measuring method I got from you, actually, from a long time
ago:
:-)
./tools/perf/perf stat -a --repeat 5 --sync --pre ~/bin/pre-build-kernel.sh -- make -s -j33 bzImage
* tip/master fdebf9c0efe4 ("Merge branch into tip/master: 'x86/sev'")
Performance counter stats for 'system wide' (5 runs):
4,144,101.54 msec cpu-clock # 32.000 CPUs utilized ( +- 0.10% )
812,478 context-switches # 196.056 /sec ( +- 0.15% )
67,201 cpu-migrations # 16.216 /sec ( +- 0.22% )
48,228,560 page-faults # 11.638 K/sec ( +- 0.01% )
9,473,229,339,058 instructions # 1.12 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.00% )
8,476,070,185,458 cycles # 2.045 GHz ( +- 0.12% )
1,988,775,653,131 stalled-cycles-frontend # 23.46% frontend cycles idle ( +- 0.14% )
2,128,585,400,027 branches # 513.642 M/sec ( +- 0.00% )
66,681,861,375 branch-misses # 3.13% of all branches ( +- 0.03% )
129.504 +- 0.127 seconds time elapsed ( +- 0.10% )
* tip/master with 9628d19e91f1 reverted
Performance counter stats for 'system wide' (5 runs):
4,141,057.45 msec cpu-clock # 32.000 CPUs utilized ( +- 0.15% )
811,299 context-switches # 195.916 /sec ( +- 0.08% )
67,644 cpu-migrations # 16.335 /sec ( +- 0.24% )
48,209,829 page-faults # 11.642 K/sec ( +- 0.00% )
9,465,299,000,193 instructions # 1.12 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.00% )
8,487,239,564,102 cycles # 2.050 GHz ( +- 0.21% )
1,992,414,836,889 stalled-cycles-frontend # 23.48% frontend cycles idle ( +- 0.08% )
2,127,019,426,911 branches # 513.642 M/sec ( +- 0.00% )
66,698,031,504 branch-misses # 3.14% of all branches ( +- 0.02% )
129.408 +- 0.195 seconds time elapsed ( +- 0.15% )
This is all within the noise.
Or maybe building the kernel even with those "optimized" inlining decisions
due the asm being of length 1 for atomic locking insns simply doesn't matter.
Or maybe I need a different benchmark.
At least it ain't breaking anything...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists