linux-kernel - Re: [PATCH] x86/asm: Use asm_inline() instead of asm() in amd_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250314132306.GDZ9QtukcVVtDmW1V1@fat_crate.local>
Date: Fri, 14 Mar 2025 14:23:06 +0100
From: Borislav Petkov <bp@...en8.de>
To: Ingo Molnar <mingo@...nel.org>
Cc: Uros Bizjak <ubizjak@...il.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] x86/asm: Use asm_inline() instead of asm() in
 amd_clear_divider()

On Fri, Mar 14, 2025 at 11:17:43AM +0100, Ingo Molnar wrote:
> Here's a link for those who'd like to view this via the web:
> 
>   https://lore.kernel.org/all/174188884263.14745.1542926632284353047.tip-bot2@tip-bot2/

This is a perf measuring method I got from you, actually, from a long time
ago:

:-)

./tools/perf/perf stat -a --repeat 5 --sync --pre ~/bin/pre-build-kernel.sh -- make -s -j33 bzImage

* tip/master fdebf9c0efe4 ("Merge branch into tip/master: 'x86/sev'")

 Performance counter stats for 'system wide' (5 runs):

      4,144,101.54 msec cpu-clock                        #   32.000 CPUs utilized               ( +-  0.10% )
           812,478      context-switches                 #  196.056 /sec                        ( +-  0.15% )
            67,201      cpu-migrations                   #   16.216 /sec                        ( +-  0.22% )
        48,228,560      page-faults                      #   11.638 K/sec                       ( +-  0.01% )
 9,473,229,339,058      instructions                     #    1.12  insn per cycle            
                                                  #    0.21  stalled cycles per insn     ( +-  0.00% )
 8,476,070,185,458      cycles                           #    2.045 GHz                         ( +-  0.12% )
 1,988,775,653,131      stalled-cycles-frontend          #   23.46% frontend cycles idle        ( +-  0.14% )
 2,128,585,400,027      branches                         #  513.642 M/sec                       ( +-  0.00% )
    66,681,861,375      branch-misses                    #    3.13% of all branches             ( +-  0.03% )

           129.504 +- 0.127 seconds time elapsed  ( +-  0.10% )

* tip/master with 9628d19e91f1 reverted

 Performance counter stats for 'system wide' (5 runs):

      4,141,057.45 msec cpu-clock                        #   32.000 CPUs utilized               ( +-  0.15% )
           811,299      context-switches                 #  195.916 /sec                        ( +-  0.08% )
            67,644      cpu-migrations                   #   16.335 /sec                        ( +-  0.24% )
        48,209,829      page-faults                      #   11.642 K/sec                       ( +-  0.00% )
 9,465,299,000,193      instructions                     #    1.12  insn per cycle            
                                                  #    0.21  stalled cycles per insn     ( +-  0.00% )
 8,487,239,564,102      cycles                           #    2.050 GHz                         ( +-  0.21% )
 1,992,414,836,889      stalled-cycles-frontend          #   23.48% frontend cycles idle        ( +-  0.08% )
 2,127,019,426,911      branches                         #  513.642 M/sec                       ( +-  0.00% )
    66,698,031,504      branch-misses                    #    3.14% of all branches             ( +-  0.02% )

           129.408 +- 0.195 seconds time elapsed  ( +-  0.15% )

This is all within the noise.

Or maybe building the kernel even with those "optimized" inlining decisions
due the asm being of length 1 for atomic locking insns simply doesn't matter.

Or maybe I need a different benchmark.

At least it ain't breaking anything...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette