[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8k6la+oWXm+VSX0@kernel.org>
Date: Thu, 19 Jan 2023 09:41:57 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Jing Zhang <renyu.zj@...ux.alibaba.com>
Cc: John Garry <john.g.garry@...cle.com>,
Ian Rogers <irogers@...gle.com>,
Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
Will Deacon <will@...nel.org>,
James Clark <james.clark@....com>,
Mike Leach <mike.leach@...aro.org>,
Leo Yan <leo.yan@...aro.org>,
linux-arm-kernel@...ts.infradead.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Andrew Kilroy <andrew.kilroy@....com>,
Shuai Xue <xueshuai@...ux.alibaba.com>,
Zhuo Song <zhuo.song@...ux.alibaba.com>
Subject: Re: [PATCH v8 0/9] Add metrics for neoverse-n2-v2
Em Tue, Jan 17, 2023 at 03:29:24PM +0800, Jing Zhang escreveu:
> Changes since v7:
> - Change pmu_core__find_same to pmu__find_core_pmu as suggested by Ian.
> - Returned NAN when can't find a pmu or the value from */caps/slots is
> zero, in perf_pmu__cpu_slots_per_cycle, as suggested by John.
> - Link: https://lore.kernel.org/all/1673601740-122788-1-git-send-email-renyu.zj@linux.alibaba.com/
Thanks, applied.
- Arnaldo
> Changes since v6:
> - Split patch 1 into 3 smaller patches as suggested by Ian.
> - Change perf_pmu__get_slots to perf_pmu__cpu_slots_per_cycle,
> per John's suggestion;
> - Return NAN instead of 0 in perf_pmu__cpu_slots_per_cycle weak
> function, per John's suggestion;
> - Factor out pmu_core__find_same function, per John's suggestion.
> - Link: https://lore.kernel.org/all/1673017529-1429208-1-git-send-email-renyu.zj@linux.alibaba.com/
>
> Changes since v5:
> - Add common topdownL1 metrics in sbsa.json as suggested by John;
> - Correct PKI/MPKI ScaleUnit to 1PKI/1MPKI;
> - Link: https://lore.kernel.org/all/1672745976-2800146-1-git-send-email-renyu.zj@linux.alibaba.com/
>
> Changes since v4:
> - Add MPKI/PKI “ScaleUnit”;
> - Add acked-by from Ian Rogers;
> - Link: https://lore.kernel.org/all/1671799045-1108027-1-git-send-email-renyu.zj@linux.alibaba.com/
>
> Changes since v3:
> - Add ipc_rate metric;
> - Drop the PublicDescription;
> - Describe PEutilization metrics in more detail;
> - Link: https://lore.kernel.org/all/1669310088-13482-1-git-send-email-renyu.zj@linux.alibaba.com/
>
> Changes since v2:
> - Correct the furmula of Branch metrics;
> - Add more PE utilization metrics;
> - Add more TLB metrics;
> - Add “ScaleUnit” for some metrics;
> - Add a newline at the end of the file;
> - Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>
> Changes since v1:
> - Corrected formula for topdown L1 due to wrong counts for stall_slot
> and stall_slot_frontend;
> - Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/
>
> This series does the following things:
>
> The slots in each architecture may be different, so add #slots literal
> to obtain the slots of different architectures, and the #slots can be
> applied in the topdown metric. Currently, The #slots just support for
> arm64, and other architectures will return NAN.
>
> The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
> D37-38, which are standard. So put them in the common file sbsa.json of
> arm64 and add general metric support, so that other cores besides n2/v2
> can also be reused.
>
> Then add topdownL1 metric for neoverse-n2-v2, and due to the wrong count
> of stall_slot and stall_slot_frontend on neoverse-n2, the real stall_slot
> and real stall_slot_frontend need to subtract cpu_cycles, so overwrite
> the "MetricExpr" for neoverse-n2.
> Reference from ARM neoverse-n2 errata notice [1], D117.
>
> Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metricgroups
> such as Cache, TLB, Branch, InstructionsMix, and PEutilization will be added
> to further analysis of performance bottlenecks in the following patches.
> Reference from ARM PMU guide [2][3].
>
> [0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
> [1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
> [2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
> [3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=
>
> Tested in neoverse-n2:
>
> $./perf list
> ...
> Metric Groups:
>
> Branch:
> branch_miss_pred_rate
> [The rate of branches mis-predited to the overall branches]
> branch_mpki
> [The rate of branches mis-predicted per kilo instructions]
> branch_pki
> [The rate of branches retired per kilo instructions]
> Cache:
> l1d_cache_miss_rate
> [The rate of L1 D-Cache misses to the overall L1 D-Cache]
> l1d_cache_mpki
> [The rate of L1 D-Cache misses per kilo instructions]
> ...
>
>
> $sudo ./perf stat -M TLB false_sharing 2
>
> Performance counter stats for 'false_sharing 2':
>
> 29,940 L2D_TLB # 20.0 % l2_tlb_miss_rate (42.36%)
> 5,998 L2D_TLB_REFILL (42.36%)
> 1,753 L1I_TLB_REFILL # 0.1 % l1i_tlb_miss_rate (43.17%)
> 2,173,957 L1I_TLB (43.17%)
> 327,944,763 L1D_TLB # 0.0 % l1d_tlb_miss_rate (43.98%)
> 22,485 L1D_TLB_REFILL (43.98%)
> 497,210 L1I_TLB # 0.0 % itlb_walk_rate (44.83%)
> 28 ITLB_WALK (44.83%)
> 821,488,762 INST_RETIRED # 0.0 MPKI itlb_mpki (43.97%)
> 122 ITLB_WALK (43.97%)
> 744 DTLB_WALK # 0.0 % dtlb_walk_rate (43.01%)
> 263,913,146 L1D_TLB (43.01%)
> 779,073,875 INST_RETIRED # 0.0 MPKI dtlb_mpki (42.07%)
> 1,050 DTLB_WALK (42.07%)
>
> 0.435864901 seconds time elapsed
>
> 1.201384000 seconds user
> 0.000000000 seconds sys
>
>
> $sudo ./perf stat -M TopDownL1 false_sharing 2
>
> Performance counter stats for 'false_sharing 2':
>
> 3,408,960,257 cpu_cycles # 0.0 % bad_speculation
> # 5.1 % retiring (66.79%)
> 19,576,079,610 stall_slot (66.79%)
> 877,673,452 op_spec (66.79%)
> 876,324,270 op_retired (66.79%)
> 3,406,548,064 cpu_cycles # 26.7 % frontend_bound (67.08%)
> 7,961,814,801 stall_slot_frontend (67.08%)
> 3,415,528,440 cpu_cycles # 68.8 % backend_bound (66.43%)
> 11,746,647,747 stall_slot_backend (66.43%)
>
> 0.455229807 seconds time elapsed
>
> 1.243216000 seconds user
> 0.000000000 seconds sys
>
> $sudo ./perf stat -M branch sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 901,495 INST_RETIRED # 223.6 PKI branch_pki
> 201,603 BR_RETIRED
> 901,495 INST_RETIRED # 10.0 MPKI branch_mpki
> 9,004 BR_MIS_PRED_RETIRED
> 9,004 BR_MIS_PRED_RETIRED # 4.5 % branch_miss_pred_rate
> 201,603 BR_RETIRED
>
> 1.000794467 seconds time elapsed
>
> 0.000905000 seconds user
> 0.000000000 seconds sys
>
>
> Jing Zhang (9):
> perf pmu: Add #slots literal support for arm64
> perf jevent: Add general metrics support
> perf vendor events arm64: Add common topdown L1 metrics
> perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2
> perf vendor events arm64: Add TLB metrics for neoverse-n2-v2
> perf vendor events arm64: Add cache metrics for neoverse-n2-v2
> perf vendor events arm64: Add branch metrics for neoverse-n2-v2
> perf vendor events arm64: Add PE utilization metrics for
> neoverse-n2-v2
> perf vendor events arm64: Add instruction mix metrics for
> neoverse-n2-v2
>
> tools/perf/arch/arm64/util/pmu.c | 35 ++-
> .../arch/arm64/arm/neoverse-n2-v2/metrics.json | 273 +++++++++++++++++++++
> tools/perf/pmu-events/arch/arm64/sbsa.json | 30 +++
> tools/perf/pmu-events/jevents.py | 2 +
> tools/perf/util/expr.c | 5 +
> tools/perf/util/pmu.c | 6 +
> tools/perf/util/pmu.h | 1 +
> 7 files changed, 350 insertions(+), 2 deletions(-)
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json
>
> --
> 1.8.3.1
--
- Arnaldo
Powered by blists - more mailing lists