linux-kernel - Re: [PATCH v3 03/46] perf stat: Introduce skippable evsels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6784858-2a5f-7920-f1ac-d7ec9ed89605@linux.intel.com>
Date:   Mon, 1 May 2023 10:56:11 -0400
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Ian Rogers <irogers@...gle.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ahmad Yasin <ahmad.yasin@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Stephane Eranian <eranian@...gle.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Perry Taylor <perry.taylor@...el.com>,
        Samantha Alt <samantha.alt@...el.com>,
        Caleb Biggers <caleb.biggers@...el.com>,
        Weilin Wang <weilin.wang@...el.com>,
        Edward Baker <edward.baker@...el.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Florian Fischer <florian.fischer@...q.space>,
        Rob Herring <robh@...nel.org>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        John Garry <john.g.garry@...cle.com>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Sumanth Korikkar <sumanthk@...ux.ibm.com>,
        Thomas Richter <tmricht@...ux.ibm.com>,
        Tiezhu Yang <yangtiezhu@...ngson.cn>,
        Ravi Bangoria <ravi.bangoria@....com>,
        Leo Yan <leo.yan@...aro.org>,
        Yang Jihong <yangjihong1@...wei.com>,
        James Clark <james.clark@....com>,
        Suzuki Poulouse <suzuki.poulose@....com>,
        Kang Minchul <tegongkang@...il.com>,
        Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 03/46] perf stat: Introduce skippable evsels



On 2023-04-29 1:34 a.m., Ian Rogers wrote:
> Perf stat with no arguments will use default events and metrics. These
> events may fail to open even with kernel and hypervisor disabled. When
> these fail then the permissions error appears even though they were
> implicitly selected. This is particularly a problem with the automatic
> selection of the TopdownL1 metric group on certain architectures like
> Skylake:
> 
> '''
> $ perf stat true
> Error:
> Access to performance monitoring and observability operations is limited.
> Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> access to performance monitoring and observability operations for processes
> without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> More information can be found at 'Perf events and tool security' document:
> https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> perf_event_paranoid setting is 2:
>   -1: Allow use of (almost) all events by all users
>       Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>> = 0: Disallow raw and ftrace function tracepoint access
>> = 1: Disallow CPU event access
>> = 2: Disallow kernel profiling
> To make the adjusted perf_event_paranoid setting permanent preserve it
> in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> '''
> 
> This patch adds skippable evsels that when they fail to open won't
> cause termination and will appear as "<not supported>" in output. The
> TopdownL1 events, from the metric group, are marked as skippable. This
> turns the failure above to:
> 
> '''
> $ perf stat perf bench internals synthesize
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
>   Average synthesis took: 49.287 usec (+- 0.083 usec)
>   Average num. events: 3.000 (+- 0.000)
>   Average time per event 16.429 usec
>   Average data synthesis took: 49.641 usec (+- 0.085 usec)
>   Average num. events: 11.000 (+- 0.000)
>   Average time per event 4.513 usec
> 
>  Performance counter stats for 'perf bench internals synthesize':
> 
>           1,222.38 msec task-clock:u                     #    0.993 CPUs utilized
>                  0      context-switches:u               #    0.000 /sec
>                  0      cpu-migrations:u                 #    0.000 /sec
>                162      page-faults:u                    #  132.529 /sec
>        774,445,184      cycles:u                         #    0.634 GHz                         (49.61%)
>      1,640,969,811      instructions:u                   #    2.12  insn per cycle              (59.67%)
>        302,052,148      branches:u                       #  247.102 M/sec                       (59.69%)
>          1,807,718      branch-misses:u                  #    0.60% of all branches             (59.68%)
>          5,218,927      CPU_CLK_UNHALTED.REF_XCLK:u      #    4.269 M/sec
>                                                   #     17.3 %  tma_frontend_bound
>                                                   #     56.4 %  tma_retiring
>                                                   #      nan %  tma_backend_bound
>                                                   #      nan %  tma_bad_speculation      (60.01%)
>        536,580,469      IDQ_UOPS_NOT_DELIVERED.CORE:u    #  438.965 M/sec                       (60.33%)
>    <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
>          5,223,936      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u #    4.274 M/sec                       (40.31%)
>        774,127,250      CPU_CLK_UNHALTED.THREAD:u        #  633.297 M/sec                       (50.34%)
>      1,746,579,518      UOPS_RETIRED.RETIRE_SLOTS:u      #    1.429 G/sec                       (50.12%)
>      1,940,625,702      UOPS_ISSUED.ANY:u                #    1.588 G/sec                       (49.70%)
> 
>        1.231055525 seconds time elapsed
> 
>        0.258327000 seconds user
>        0.965749000 seconds sys


Which branch is this patch series based on?

I still cannot get the same output as the examples.

I'm using the latest perf-tools-next (The latest commit ID is
5d27a645f609 ("perf tracepoint: Fix memory leak in is_valid_tracepoint()")).
I only applied patch 2 and patch 3, since the patch 1 is already merged.

It's a single socket Cascade Lake. with kernel 5.19-8.
$ uname -r
5.19.8-100.fc35.x86_64

As you can see, all the topdown related events are displayed twice.

With root permission,

$ sudo ./perf stat perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 91.487 usec (+- 0.050 usec)
  Average num. events: 47.000 (+- 0.000)
  Average time per event 1.947 usec
  Average data synthesis took: 97.720 usec (+- 0.059 usec)
  Average num. events: 245.000 (+- 0.000)
  Average time per event 0.399 usec

 Performance counter stats for 'perf bench internals synthesize':

          2,077.81 msec task-clock                       #    0.998 CPUs
utilized
               466      context-switches                 #  224.274 /sec
                 4      cpu-migrations                   #    1.925 /sec
               775      page-faults                      #  372.988 /sec
     9,561,957,326      cycles                           #    4.602 GHz
                       (31.17%)
    24,466,854,021      instructions                     #    2.56  insn
per cycle              (37.42%)
     5,547,892,196      branches                         #    2.670
G/sec                       (37.48%)
        37,880,526      branch-misses                    #    0.68% of
all branches             (37.52%)
        49,576,109      CPU_CLK_UNHALTED.REF_XCLK        #   23.860 M/sec
                                                  #     59.9 %  tma_retiring
                                                  #      4.6 %
tma_bad_speculation      (37.47%)
       228,406,003      INT_MISC.RECOVERY_CYCLES_ANY     #  109.926
M/sec                       (37.52%)
        49,591,815      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE #   23.867
M/sec                       (24.99%)
     9,553,472,893      CPU_CLK_UNHALTED.THREAD          #    4.598
G/sec                       (31.25%)
    22,893,372,651      UOPS_RETIRED.RETIRE_SLOTS        #   11.018
G/sec                       (31.23%)
    24,180,375,299      UOPS_ISSUED.ANY                  #   11.637
G/sec                       (31.25%)
        49,562,300      CPU_CLK_UNHALTED.REF_XCLK        #   23.853 M/sec
                                                  #     28.1 %
tma_frontend_bound
                                                  #      7.2 %
tma_backend_bound        (31.24%)
    10,735,205,084      IDQ_UOPS_NOT_DELIVERED.CORE      #    5.167
G/sec                       (31.30%)
       228,798,426      INT_MISC.RECOVERY_CYCLES_ANY     #  110.115
M/sec                       (25.04%)
        49,559,962      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE #   23.852
M/sec                       (25.00%)
     9,538,354,333      CPU_CLK_UNHALTED.THREAD          #    4.591
G/sec                       (31.29%)
    24,207,967,071      UOPS_ISSUED.ANY                  #   11.651
G/sec                       (31.24%)

       2.082670856 seconds time elapsed

       0.812763000 seconds user
       1.252387000 seconds sys


With non-root, nothing is counted for the topdownL1 events.

$ ./perf stat perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 91.852 usec (+- 0.139 usec)
  Average num. events: 47.000 (+- 0.000)
  Average time per event 1.954 usec
  Average data synthesis took: 96.230 usec (+- 0.046 usec)
  Average num. events: 245.000 (+- 0.000)
  Average time per event 0.393 usec

 Performance counter stats for 'perf bench internals synthesize':

          2,051.95 msec task-clock:u                     #    0.997 CPUs
utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               765      page-faults:u                    #  372.816 /sec
     3,601,662,523      cycles:u                         #    1.755 GHz
                       (16.72%)
     9,241,811,003      instructions:u                   #    2.57  insn
per cycle              (33.43%)
     2,238,848,485      branches:u                       #    1.091
G/sec                       (50.06%)
        19,966,181      branch-misses:u                  #    0.89% of
all branches             (66.77%)
     <not counted>      CPU_CLK_UNHALTED.REF_XCLK:u
   <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
     <not counted>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
     <not counted>      CPU_CLK_UNHALTED.THREAD:u
     <not counted>      UOPS_RETIRED.RETIRE_SLOTS:u
     <not counted>      UOPS_ISSUED.ANY:u
     <not counted>      CPU_CLK_UNHALTED.REF_XCLK:u
     <not counted>      IDQ_UOPS_NOT_DELIVERED.CORE:u
   <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
     <not counted>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
     <not counted>      CPU_CLK_UNHALTED.THREAD:u
     <not counted>      UOPS_ISSUED.ANY:u

       2.057691297 seconds time elapsed

       0.766640000 seconds user
       1.275170000 seconds sys


Thanks,
Kan