linux-kernel - [RFC PATCH 00/25] Perf stat metric grouping with hardware information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230925061824.3818631-1-weilin.wang@intel.com>
Date:   Sun, 24 Sep 2023 23:17:59 -0700
From:   weilin.wang@...el.com
To:     weilin.wang@...el.com, Ian Rogers <irogers@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Kan Liang <kan.liang@...ux.intel.com>
Cc:     linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Perry Taylor <perry.taylor@...el.com>,
        Samantha Alt <samantha.alt@...el.com>,
        Caleb Biggers <caleb.biggers@...el.com>,
        Mark Rutland <mark.rutland@....com>
Subject: [RFC PATCH 00/25] Perf stat metric grouping with hardware information

From: Weilin Wang <weilin.wang@...el.com>

Perf stat metric grouping generates event groups that are provided to kernel for
data collection using the hardware counters. Sometimes, the grouping might fail
and kernel has to retry the groups because generated groups do not fit in the
hardware counters correctly. In some other cases, the groupings are collected
correctly, however, they left some hardware counters unused.

To improve these inefficiencies, we would like to propose a hardware aware
grouping method that does metric/event grouping based on event counter
restriction rules and the availability of hardware counters in the system. This
method is generic as long as all the restriction rules could be provided from
the pmu-event JSON files.

This patch set includes code that does hardware aware grouping and updated
pmu-event JSON files for four platforms (SapphireRapids, Icelakex, Cascadelakex,
and Tigerlake) for your testing and experimenting. We've successfully tested
these patches on three platforms (SapphireRapids, Icelakex, and Cascadelakex)
with topdown metrics from TopdownL1 to TopdownL6.

There are some optimization opportunities that we might implement in the future:
1) Better NMI hanlding: when NMI watchdog is enabled, we reduce the default_core
total counter size by one. This could be improved to better utilize the counter.
2) Fill important events into unused counter for better counter utlization:
there might be some unused counters scattered in the groups. We could consider
to add important events in this slots if necessary. This could help increase the
multiplexing percentage and help improve accuracy if the event is critical.

Remaining questions for dicussion:
3) Where to start grouping from? The current implementation start grouping by
combining all the events into a single list. This step deduplicates events. But
it does not maintain the relationship of events according to the metrics, i.e.
events required by one metric may not be collected at the same time. Another
type of starting point would be grouping each individual metric and then try to
merge the groups.
4) Any comments, suggestions, new ideas?
5) If you are interested to test the patch out and the pmu-event JSON files of
your testing platform is not provided here, please let me know so that I could
provide you the files.


Weilin Wang (25):
  perf stat: Add hardware-grouping cmd option to perf stat
  perf stat: Add basic functions for the hardware-grouping stat cmd
    option
  perf pmu-events: Add functions in jevent.py
  perf pmu-events: Add counter info into JSON files for SapphireRapids
  perf pmu-events: Add event counter data for Cascadelakex
  perf pmu-events: Add event counter data for Icelakex
  perf stat: Add helper functions for hardware-grouping method
  perf stat: Add functions to get counter info
  perf stat: Add helper functions for hardware-grouping method
  perf stat: Add helper functions to hardware-grouping method
  perf stat: Add utility functions to hardware-grouping method
  perf stat: Add more functions for hardware-grouping method
  perf stat: Add functions to hardware-grouping method
  perf stat: Add build string function and topdown events handling in
    hardware-grouping
  perf stat: Add function to combine metrics for hardware-grouping
  perf stat: Update keyword core to default_core to adjust to the
    changes for events with no unit
  perf stat: Handle taken alone in hardware-grouping
  perf stat: Handle NMI in hardware-grouping
  perf stat: Handle grouping method fall back in hardware-grouping
  perf stat: Code refactoring in hardware-grouping
  perf stat: Add tool events support in hardware-grouping
  perf stat: Add TSC support in hardware-grouping
  perf stat: Fix a return error issue in hardware-grouping
  perf stat: Add check to ensure correctness in platform that does not
    support hardware-grouping
  perf pmu-events: Add event counter data for Tigerlake

 tools/lib/bitmap.c                            |   20 +
 tools/perf/builtin-stat.c                     |    7 +
 .../arch/x86/cascadelakex/cache.json          | 1237 ++++++++++++
 .../arch/x86/cascadelakex/counter.json        |   17 +
 .../arch/x86/cascadelakex/floating-point.json |   16 +
 .../arch/x86/cascadelakex/frontend.json       |   68 +
 .../arch/x86/cascadelakex/memory.json         |  751 ++++++++
 .../arch/x86/cascadelakex/other.json          |  168 ++
 .../arch/x86/cascadelakex/pipeline.json       |  102 +
 .../arch/x86/cascadelakex/uncore-cache.json   | 1138 +++++++++++
 .../x86/cascadelakex/uncore-interconnect.json | 1272 +++++++++++++
 .../arch/x86/cascadelakex/uncore-io.json      |  394 ++++
 .../arch/x86/cascadelakex/uncore-memory.json  |  509 +++++
 .../arch/x86/cascadelakex/uncore-power.json   |   25 +
 .../arch/x86/cascadelakex/virtual-memory.json |   28 +
 .../pmu-events/arch/x86/icelakex/cache.json   |   98 +
 .../pmu-events/arch/x86/icelakex/counter.json |   17 +
 .../arch/x86/icelakex/floating-point.json     |   13 +
 .../arch/x86/icelakex/frontend.json           |   55 +
 .../pmu-events/arch/x86/icelakex/memory.json  |   53 +
 .../pmu-events/arch/x86/icelakex/other.json   |   52 +
 .../arch/x86/icelakex/pipeline.json           |   92 +
 .../arch/x86/icelakex/uncore-cache.json       |  965 ++++++++++
 .../x86/icelakex/uncore-interconnect.json     | 1667 +++++++++++++++++
 .../arch/x86/icelakex/uncore-io.json          |  966 ++++++++++
 .../arch/x86/icelakex/uncore-memory.json      |  186 ++
 .../arch/x86/icelakex/uncore-power.json       |   26 +
 .../arch/x86/icelakex/virtual-memory.json     |   22 +
 .../arch/x86/sapphirerapids/cache.json        |  104 +
 .../arch/x86/sapphirerapids/counter.json      |   17 +
 .../x86/sapphirerapids/floating-point.json    |   25 +
 .../arch/x86/sapphirerapids/frontend.json     |   98 +-
 .../arch/x86/sapphirerapids/memory.json       |   44 +
 .../arch/x86/sapphirerapids/other.json        |   40 +
 .../arch/x86/sapphirerapids/pipeline.json     |  118 ++
 .../arch/x86/sapphirerapids/uncore-cache.json |  534 +++++-
 .../arch/x86/sapphirerapids/uncore-cxl.json   |   56 +
 .../sapphirerapids/uncore-interconnect.json   |  476 +++++
 .../arch/x86/sapphirerapids/uncore-io.json    |  373 ++++
 .../x86/sapphirerapids/uncore-memory.json     |  391 ++++
 .../arch/x86/sapphirerapids/uncore-power.json |   24 +
 .../x86/sapphirerapids/virtual-memory.json    |   20 +
 .../pmu-events/arch/x86/tigerlake/cache.json  |   65 +
 .../arch/x86/tigerlake/counter.json           |    7 +
 .../arch/x86/tigerlake/floating-point.json    |   13 +
 .../arch/x86/tigerlake/frontend.json          |   56 +
 .../pmu-events/arch/x86/tigerlake/memory.json |   31 +
 .../pmu-events/arch/x86/tigerlake/other.json  |    4 +
 .../arch/x86/tigerlake/pipeline.json          |   96 +
 .../x86/tigerlake/uncore-interconnect.json    |   11 +
 .../arch/x86/tigerlake/uncore-memory.json     |    6 +
 .../arch/x86/tigerlake/uncore-other.json      |    1 +
 .../arch/x86/tigerlake/virtual-memory.json    |   20 +
 tools/perf/pmu-events/jevents.py              |  179 +-
 tools/perf/pmu-events/pmu-events.h            |   26 +-
 tools/perf/util/metricgroup.c                 |  927 +++++++++
 tools/perf/util/metricgroup.h                 |   82 +
 tools/perf/util/pmu.c                         |    5 +
 tools/perf/util/pmu.h                         |    1 +
 tools/perf/util/stat.h                        |    1 +
 60 files changed, 13790 insertions(+), 25 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/counter.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/counter.json
 create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/counter.json
 create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/counter.json

--
2.39.3