linux-kernel - Re: perf tool: Issues with metricgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <49c6fccb-b716-1bf0-18a6-cace1cdb66b9@huawei.com>
Date:   Wed, 9 Jun 2021 11:23:51 +0100
From:   John Garry <john.garry@...wei.com>
To:     Ian Rogers <irogers@...gle.com>
CC:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>
Subject: Re: perf tool: Issues with metricgroups

On 09/06/2021 07:15, Ian Rogers wrote:

Hi Ian,

> The fix to avoid uncore_ events being deduplicated against each other
> added complexity to the code and means that metric-no-group doesn't
> really work any more. I have it on my list of things to look at. It
> relates to what you are looking at as the deduplication afterward is
> tricky given the funny invariants on evsel names. I think it would be
> easier to deduplicate events before doing the event parse. It may also
> be good to change evsels so that they own the string for their name
> (this would mean uncore_imc events could have unique names and not get
> deduplicated against each other). The invariants around cycles in your
> change look weird, but I can see how it might workaround an issue. My
> attempts to reproduce the issue weren't successful on a SkylakeX.

I am a bit surprised that you could not reproduce on SkylakeX, as the 
metric expressions are the same.

As an experiment I hacked the mapfile.csv to make my broadwell machine 
pick up the skylakex pmu-events:

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv 
b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 5f5df6560202..3f170fc430b2 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -1,6 +1,6 @@
  Family-model,Version,Filename,EventType
  GenuineIntel-6-56,v5,broadwellde,core
-GenuineIntel-6-3D,v17,broadwell,core
+GenuineIntel-6-3D,v17,skylakex,core
  GenuineIntel-6-47,v17,broadwell,core
  GenuineIntel-6-4F,v10,broadwellx,core
  GenuineIntel-6-1C,v4,bonnell,core


And I still see the issue:

john@...alhost:~/acme/tools/perf> sudo  ./perf stat -v -M 
retiring,backend_bound sleep 1
Using CPUID GenuineIntel-6-3D-4
metric expr uops_retired.retire_slots / (4 * cycles) for Retiring
found event cycles
found event uops_retired.retire_slots
metric expr 1 - ( (idq_uops_not_delivered.core / (4 * cycles)) + (( 
uops_issued.any - uops_retired.retire_slots + 4 * 
int_misc.recovery_cycles ) / (4 * cycles)) + (uops_retired.retire_slots 
/ (4 * cycles)) ) for Backend_Bound
found event uops_issued.any
found event cycles
found event idq_uops_not_delivered.core
found event int_misc.recovery_cycles
found event uops_retired.retire_slots
adding 
{cycles,uops_retired.retire_slots}:W,{uops_issued.any,cycles,idq_uops_not_delivered.core,int_misc.recovery_cycles,uops_retired.retire_slots}:W
uops_retired.retire_slots -> cpu/(null)=0x1e8483,umask=0x2,event=0xc2/
uops_issued.any -> cpu/(null)=0x1e8483,umask=0x1,event=0xe/
idq_uops_not_delivered.core -> cpu/(null)=0x1e8483,umask=0x1,event=0x9c/
int_misc.recovery_cycles -> cpu/(null)=0x1e8483,umask=0x1,event=0xd/
uops_retired.retire_slots -> cpu/(null)=0x1e8483,umask=0x2,event=0xc2/
Control descriptor is not initialized
cycles: 1648306 533003 533003
uops_retired.retire_slots: 1309840 533003 533003
uops_issued.any: 0 533003 0
cycles: 0 533003 0
idq_uops_not_delivered.core: 0 533003 0
int_misc.recovery_cycles: 0 533003 0
uops_retired.retire_slots: 0 533003 0

  Performance counter stats for 'sleep 1':

          1,648,306      cycles 

                                                   #     0.20 Retiring 

          1,309,840      uops_retired.retire_slots 

      <not counted>      uops_issued.any 
               (0.00%)
      <not counted>      cycles 
               (0.00%)
      <not counted>      idq_uops_not_delivered.core 
                 (0.00%)
      <not counted>      int_misc.recovery_cycles 
               (0.00%)
      <not counted>      uops_retired.retire_slots 
               (0.00%)

        1.000942715 seconds time elapsed

        0.000954000 seconds user
        0.000000000 seconds sys

The events in group usually have to be from the same PMU. Try 
reorganizing the group.
john@...alhost:~/acme/tools/perf>

> 
> Thanks for reporting the issues. I planned to look at this logic to
> fix metric-no-group, it'd be nice to land:
> https://lore.kernel.org/lkml/20210112230434.2631593-1-irogers@google.com/
> just so that I'm not making patch sets that conflict with myself.

As I said, one issue is caused by me, and I can send a fix. I need to 
test more, though. And I was holding off until an approach decided for 
2nd issue. Since no resolution yet, I think I'll just send a fix today.

Thanks,
John