lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <677c6ec7-2e01-635b-dbfb-fbb9280e5b7c@linux.intel.com>
Date:   Tue, 4 Oct 2022 10:28:45 -0700
From:   Andi Kleen <ak@...ux.intel.com>
To:     Ian Rogers <irogers@...gle.com>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        Kan Liang <kan.liang@...ux.intel.com>, perry.taylor@...el.com,
        caleb.biggers@...el.com, kshipra.bopardikar@...el.com,
        samantha.alt@...el.com, ahmad.yasin@...el.com,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:     Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v3 00/23] Improvements to Intel perf metrics

[cutting down cc list]


On 10/3/2022 8:43 PM, Ian Rogers wrote:
> On Mon, Oct 3, 2022 at 7:16 PM Ian Rogers <irogers@...gle.com> wrote:
>> For consistency with:
>> https://github.com/intel/perfmon-metrics
>> rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.
>>
>> Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
>> are correctly expanded in the single main metric. Fix perf expr to
>> allow a double if to be correctly processed.
>>
>> Add all 6 levels of TMA metrics. Child metrics are placed in a group
>> named after their parent allowing children of a metric to be
>> easily measured using the metric name with a _group suffix.
>>
>> Don't drop TMA metrics if they contain topdown events.
>>
>> The ## and ##? operators are correctly expanded.
>>
>> The locate-with column is added to the long description describing a
>> sampling event.
>>
>> Metrics are written in terms of other metrics to reduce the expression
>> size and increase readability.
>>
>> Following this the pmu-events/arch/x86 directories match those created
>> by the script at:
>> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
>> with updates at:
>> https://github.com/captain5050/event-converter-for-linux-perf
>>
>>
>> v3. Fix a parse metrics test failure due to making metrics referring
>>      to other metrics case sensitive - make the cases in the test
>>      metric match.
>> v2. Fixes commit message wrt missing mapfile.csv updates as noted by
>>      Zhengjun Xing <zhengjun.xing@...ux.intel.com>. ScaleUnit is added
>>      for TMA metrics. Metrics with topdown events have have a missing
>>      slots event added if necessary. The latest metrics at:
>>      https://github.com/intel/perfmon-metrics are used, however, the
>>      event-converter-for-linux-perf scripts now prefer their own
>>      metrics in case of mismatched units when a metric is written in
>>      terms of another.  Additional testing was performed on broadwell,
>>      broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
>>      CPUs.
> I wrote up a little example of performing a top-down analysis for the
> perf wiki here:
> https://perf.wiki.kernel.org/index.php/Top-Down_Analysis


I did some quick testing.

On Skylake the output of L1 isn't scaled to percent:

$ ./perf stat -M TopdownL1 ~/pmu/pmu-tools/workloads/BC1s

  Performance counter stats for '/home/ak/pmu/pmu-tools/workloads/BC1s':

        608,066,701      INT_MISC.RECOVERY_CYCLES         # 0.32 
Bad_Speculation          (50.02%)
      5,364,230,382      CPU_CLK_UNHALTED.THREAD          # 0.48 
Retiring                 (50.02%)
     10,194,062,626 UOPS_RETIRED.RETIRE_SLOTS (50.02%)
     14,613,100,390 UOPS_ISSUED.ANY (50.02%)
      2,928,793,077      IDQ_UOPS_NOT_DELIVERED.CORE      # 0.14 
Frontend_Bound
                                                   #     0.07 
Backend_Bound            (50.02%)
        604,850,703 INT_MISC.RECOVERY_CYCLES (50.02%)
      5,357,291,185 CPU_CLK_UNHALTED.THREAD (50.02%)
     14,618,285,580 UOPS_ISSUED.ANY (50.02%)

Then if I follow the wiki example here I would expect I need to do

$ ./perf stat -M tma_backend_bound_group ~/pmu/pmu-tools/workloads/BC1s

Cannot find metric or group `tma_backend_bound_group'

but tma_retiring_group doesn't exist. So it seems the methodology isn't 
fully consistent everywhere? Perhaps the wiki needs to document the 
supported CPUs and also what part of the hierarchy is supported.

Another problem I noticed in the example is that the sample event didn't 
specify PEBS, even though it probably should at least on Icelake+ where 
every event can be used with less over with PEBS.

Also with all these groups that need to be specified by hand some bash 
completion support for groups would be really useful)

-Andi


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ