lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+icZUWSCS6vAQOXoG6nsW+Dbnogivzf+rmegCTMjz5hjE5cKQ@mail.gmail.com>
Date:   Sat, 13 Mar 2021 06:26:15 +0100
From:   Sedat Dilek <sedat.dilek@...il.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        rostedt@...dmis.org, hpa@...or.com, torvalds@...uxfoundation.org,
        linux-kernel@...r.kernel.org, linux-toolchains@...r.kernel.org,
        jpoimboe@...hat.com, alexei.starovoitov@...il.com,
        mhiramat@...nel.org
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 10:00 PM Borislav Petkov <bp@...en8.de> wrote:
>
> On Fri, Mar 12, 2021 at 12:32:53PM +0100, Peter Zijlstra wrote:
> > Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> > simply irrelevant today, remove variable NOPs and use NOPL.
>
> Just ran them on my SNB box:
>
> cpu family      : 6
> model           : 45
> model name      : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
> stepping        : 7
>
> with the usual perf stat kernel build workload with
> CONFIG_DYNAMIC_FTRACE and CONFIG_FUNCTION_TRACER where each function has
> a NOP at its beginning when ftrace is disabled (thx Steve).
>
> ./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 bzImage
>
> before: tip-master
>
>  Performance counter stats for 'make -s -j9 bzImage' (5 runs):
>
>       3,213,728.10 msec task-clock                #    7.307 CPUs utilized            ( +-  0.01% )
>            339,270      context-switches          #    0.106 K/sec                    ( +-  0.09% )
>             31,472      cpu-migrations            #    0.010 K/sec                    ( +-  0.64% )
>         62,070,684      page-faults               #    0.019 M/sec                    ( +-  0.01% )
> 11,498,198,009,323      cycles                    #    3.578 GHz                      ( +-  0.01% )  (83.33%)
>  8,235,957,366,696      stalled-cycles-frontend   #   71.63% frontend cycles idle     ( +-  0.01% )  (83.33%)
>  5,976,456,688,814      stalled-cycles-backend    #   51.98% backend cycles idle      ( +-  0.02% )  (66.67%)
>  7,553,156,344,376      instructions              #    0.66  insn per cycle
>                                                   #    1.09  stalled cycles per insn  ( +-  0.00% )  (83.33%)
>  1,635,468,917,524      branches                  #  508.901 M/sec                    ( +-  0.00% )  (83.34%)
>     51,888,292,932      branch-misses             #    3.17% of all branches          ( +-  0.02% )  (83.33%)
>
>            439.809 +- 0.156 seconds time elapsed  ( +-  0.04% )
>
>
> after: tip-master-nops
>
>  Performance counter stats for 'make -s -j9 bzImage' (5 runs):
>
>       3,217,113.67 msec task-clock                #    7.307 CPUs utilized            ( +-  0.03% )
>            339,425      context-switches          #    0.106 K/sec                    ( +-  0.20% )
>             31,724      cpu-migrations            #    0.010 K/sec                    ( +-  0.54% )
>         62,027,130      page-faults               #    0.019 M/sec                    ( +-  0.01% )
> 11,508,779,965,901      cycles                    #    3.577 GHz                      ( +-  0.03% )  (83.34%)
>  8,241,212,210,440      stalled-cycles-frontend   #   71.61% frontend cycles idle     ( +-  0.04% )  (83.33%)
>  5,982,615,533,177      stalled-cycles-backend    #   51.98% backend cycles idle      ( +-  0.06% )  (66.66%)
>  7,546,407,430,314      instructions              #    0.66  insn per cycle
>                                                   #    1.09  stalled cycles per insn  ( +-  0.00% )  (83.33%)
>  1,634,187,006,479      branches                  #  507.967 M/sec                    ( +-  0.00% )  (83.33%)
>     51,941,580,371      branch-misses             #    3.18% of all branches          ( +-  0.01% )  (83.33%)
>
>            440.266 +- 0.195 seconds time elapsed  ( +-  0.04% )
>
>
> So here's numbers talk, bullshit walks. And with those numbers no
> bullshit can remain lingering around anyway.
>

Here are my numbers.

My CPU:

cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz
stepping        : 7

My base was Linus Git:

$ git describe master
v5.12-rc2-338-gf78d76e72a46

I used Peter's patchset plus a required pre-patch so that it cleanly
applies against Linus Git:

x86/jump_label: Mark arguments as const to satisfy asm constraints
x86: Remove dynamic NOP selection
objtool,x86: Use asm/nops.h

My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
on Debian/testing AMD64.

Patchset applied for a first build:

 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@...il.com
KBUILD_BUILD_TIMESTAMP=2021-03-12 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-7~bullseye+dileks1':

      55605704.79 msec task-clock                #    3.568 CPUs
utilized
          8317406      context-switches          #    0.150 K/sec
           261843      cpu-migrations            #    0.005 K/sec
        288312867      page-faults               #    0.005 M/sec
  107642573933061      cycles                    #    1.936 GHz
   82531165255218      stalled-cycles-frontend   #   76.67% frontend
cycles idle
   64932777217096      stalled-cycles-backend    #   60.32% backend
cycles idle
   59591288273663      instructions              #    0.55  insn per
cycle
                                                 #    1.38  stalled
cycles per insn
   10906545460023      branches                  #  196.141 M/sec
     489809039153      branch-misses             #    4.49% of all
branches

  15582.829443660 seconds time elapsed

  53102.403996000 seconds user
   2547.134916000 seconds sys

Building on a kernel where above patchset was applied and booted into
and rebuild with the same code-base:

 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-8-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@...il.com
KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-8~bullseye+dileks1':

      56976758.12 msec task-clock                #    3.589 CPUs
utilized
          8334519      context-switches          #    0.146 K/sec
           269340      cpu-migrations            #    0.005 K/sec
        288451841      page-faults               #    0.005 M/sec
  110795226760909      cycles                    #    1.945 GHz
   85643743105935      stalled-cycles-frontend   #   77.30% frontend
cycles idle
   68146424096780      stalled-cycles-backend    #   61.51% backend
cycles idle
   59559370217381      instructions              #    0.54  insn per
cycle
                                                 #    1.44  stalled
cycles per insn
   10902087911812      branches                  #  191.343 M/sec
     490447660403      branch-misses             #    4.50% of all
branches

  15875.267204283 seconds time elapsed

  54502.552543000 seconds user
   2519.914516000 seconds sys

Simply comparing the build-times:
~15583 vs. ~15875 means approx. 5mins more build-time.

Attached are my linux-configs and above mentioned build-times (in case
Gmail has truncated them).

- Sedat -

View attachment "build-time_5.12.0-rc2-7-amd64-clang12-cfi.txt" of type "text/plain" (1344 bytes)

Download attachment "config-5.12.0-rc2-7-amd64-clang12-cfi" of type "application/octet-stream" (239393 bytes)

Download attachment "config-5.12.0-rc2-8-amd64-clang12-cfi" of type "application/octet-stream" (239393 bytes)

View attachment "build-time_5.12.0-rc2-8-amd64-clang12-cfi.txt" of type "text/plain" (1344 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ