lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR11MB2807A842BCDCE8111D457378EC659@BYAPR11MB2807.namprd11.prod.outlook.com>
Date:   Wed, 26 Apr 2023 23:26:44 +0000
From:   "Yasin, Ahmad" <ahmad.yasin@...el.com>
To:     Ian Rogers <irogers@...gle.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        "Eranian, Stephane" <eranian@...gle.com>,
        Andi Kleen <ak@...ux.intel.com>,
        "Taylor, Perry" <perry.taylor@...el.com>,
        "Alt, Samantha" <samantha.alt@...el.com>,
        "Biggers, Caleb" <caleb.biggers@...el.com>,
        "Wang, Weilin" <weilin.wang@...el.com>,
        "Baker, Edward" <edward.baker@...el.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        "Namhyung Kim" <namhyung@...nel.org>,
        "Hunter, Adrian" <adrian.hunter@...el.com>,
        Florian Fischer <florian.fischer@...q.space>,
        Rob Herring <robh@...nel.org>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        John Garry <john.g.garry@...cle.com>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Sumanth Korikkar <sumanthk@...ux.ibm.com>,
        Thomas Richter <tmricht@...ux.ibm.com>,
        Tiezhu Yang <yangtiezhu@...ngson.cn>,
        Ravi Bangoria <ravi.bangoria@....com>,
        Leo Yan <leo.yan@...aro.org>,
        Yang Jihong <yangjihong1@...wei.com>,
        James Clark <james.clark@....com>,
        Suzuki Poulouse <suzuki.poulose@....com>,
        Kang Minchul <tegongkang@...il.com>,
        Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
        "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v1 01/40] perf stat: Introduce skippable evsels

The output got needlessly lengthened with recent changes from Ian.

These four metrics:
                  #     14.5 %  tma_retiring
                                                  #     27.6 %  tma_backend_bound
                                                  #     40.9 %  tma_frontend_bound
                                                  #     17.0 %  tma_bad_speculation
better be appended on the right hand side of these four events (as current perf-stat does):
           144,922      topdown-retiring:u
           411,266      topdown-fe-bound:u
           258,510      topdown-be-bound:u
           184,090      topdown-bad-spec:u

Also, I think we should not bother the default perf-stat users with the last two events:
             2,585      INT_MISC.UOP_DROPPING:u          #    4.528 M/sec
             3,434      cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u #    6.015 M/sec

(yes, there are meant to increase accuracy of the previous tma_* level1 metrics, but the underlying event vary from model to model, e.g. SKL to ICL to SPR).

Besides, I can think on better metrics to append on the top-most TMA event (TOPDOWN.SLOTS). tma_retiring does not belong there.

Ahmad

-----Original Message-----
From: Ian Rogers <irogers@...gle.com> 
Sent: Wednesday, April 26, 2023 10:00
To: Arnaldo Carvalho de Melo <acme@...nel.org>; Kan Liang <kan.liang@...ux.intel.com>; Yasin, Ahmad <ahmad.yasin@...el.com>; Peter Zijlstra <peterz@...radead.org>; Ingo Molnar <mingo@...hat.com>; Eranian, Stephane <eranian@...gle.com>; Andi Kleen <ak@...ux.intel.com>; Taylor, Perry <perry.taylor@...el.com>; Alt, Samantha <samantha.alt@...el.com>; Biggers, Caleb <caleb.biggers@...el.com>; Wang, Weilin <weilin.wang@...el.com>; Baker, Edward <edward.baker@...el.com>; Mark Rutland <mark.rutland@....com>; Alexander Shishkin <alexander.shishkin@...ux.intel.com>; Jiri Olsa <jolsa@...nel.org>; Namhyung Kim <namhyung@...nel.org>; Hunter, Adrian <adrian.hunter@...el.com>; Florian Fischer <florian.fischer@...q.space>; Rob Herring <robh@...nel.org>; Zhengjun Xing <zhengjun.xing@...ux.intel.com>; John Garry <john.g.garry@...cle.com>; Kajol Jain <kjain@...ux.ibm.com>; Sumanth Korikkar <sumanthk@...ux.ibm.com>; Thomas Richter <tmricht@...ux.ibm.com>; Tiezhu Yang <yangtiezhu@...ngson.cn>; Ravi Bangoria <ravi.bangoria@....com>; Leo Yan <leo.yan@...aro.org>; Yang Jihong <yangjihong1@...wei.com>; James Clark <james.clark@....com>; Suzuki Poulouse <suzuki.poulose@....com>; Kang Minchul <tegongkang@...il.com>; Athira Rajeev <atrajeev@...ux.vnet.ibm.com>; linux-perf-users@...r.kernel.org; linux-kernel@...r.kernel.org
Cc: Ian Rogers <irogers@...gle.com>
Subject: [PATCH v1 01/40] perf stat: Introduce skippable evsels

Perf stat with no arguments will use default events and metrics. These events may fail to open even with kernel and hypervisor disabled. When these fail then the permissions error appears even though they were implicitly selected. This is particularly a problem with the automatic selection of the TopdownL1 metric group on certain architectures like
Skylake:

```
$ perf stat true
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open access to performance monitoring and observability operations for processes without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 2:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access = 1: Disallow 
>CPU event access = 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) ```

This patch adds skippable evsels that when they fail to open won't fail and won't appear in output. The TopdownL1 events, from the metric group, are marked as skippable. This turns the failure above to:

```
$ perf stat true

 Performance counter stats for 'true':

              1.26 msec task-clock:u                     #    0.328 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                49      page-faults:u                    #   38.930 K/sec
           176,449      cycles:u                         #    0.140 GHz                         (48.99%)
           122,905      instructions:u                   #    0.70  insn per cycle
            28,264      branches:u                       #   22.456 M/sec
             2,405      branch-misses:u                  #    8.51% of all branches

       0.003834565 seconds time elapsed

       0.000000000 seconds user
       0.004130000 seconds sys
```

When the events can have kernel/hypervisor disabled, like on Tigerlake, then it continues to succeed as:

```
$ perf stat true

 Performance counter stats for 'true':

              0.57 msec task-clock:u                     #    0.385 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                47      page-faults:u                    #   82.329 K/sec
           287,017      cycles:u                         #    0.503 GHz
           133,318      instructions:u                   #    0.46  insn per cycle
            31,396      branches:u                       #   54.996 M/sec
             2,442      branch-misses:u                  #    7.78% of all branches
           998,790      TOPDOWN.SLOTS:u                  #     14.5 %  tma_retiring
                                                  #     27.6 %  tma_backend_bound
                                                  #     40.9 %  tma_frontend_bound
                                                  #     17.0 %  tma_bad_speculation
           144,922      topdown-retiring:u
           411,266      topdown-fe-bound:u
           258,510      topdown-be-bound:u
           184,090      topdown-bad-spec:u
             2,585      INT_MISC.UOP_DROPPING:u          #    4.528 M/sec
             3,434      cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/u #    6.015 M/sec

       0.001480954 seconds time elapsed

       0.000000000 seconds user
       0.001686000 seconds sys
```

And this likewise works if paranoia allows or running as root.

Signed-off-by: Ian Rogers <irogers@...gle.com>
---
 tools/perf/builtin-stat.c      | 39 ++++++++++++++++++++++++++--------
 tools/perf/util/evsel.c        | 15 +++++++++++--
 tools/perf/util/evsel.h        |  1 +
 tools/perf/util/stat-display.c |  4 ++++
 4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index efda63f6bf32..eb34f5418ad3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
 			evsel_list->core.threads->err_thread = -1;
 			return COUNTER_RETRY;
 		}
+	} else if (counter->skippable) {
+		if (verbose > 0)
+			ui__warning("skipping event %s that kernel failed to open .\n",
+				    evsel__name(counter));
+		counter->supported = false;
+		counter->errored = true;
+		return COUNTER_SKIP;
 	}
 
 	evsel__open_strerror(counter, &target, errno, msg, sizeof(msg)); @@ -1885,15 +1892,29 @@ static int add_default_attributes(void)
 		 * Add TopdownL1 metrics if they exist. To minimize
 		 * multiplexing, don't request threshold computation.
 		 */
-		if (metricgroup__has_metric("TopdownL1") &&
-		    metricgroup__parse_groups(evsel_list, "TopdownL1",
-					    /*metric_no_group=*/false,
-					    /*metric_no_merge=*/false,
-					    /*metric_no_threshold=*/true,
-					    stat_config.user_requested_cpu_list,
-					    stat_config.system_wide,
-					    &stat_config.metric_events) < 0)
-			return -1;
+		if (metricgroup__has_metric("TopdownL1")) {
+			struct evlist *metric_evlist = evlist__new();
+			struct evsel *metric_evsel;
+
+			if (!metric_evlist)
+				return -1;
+
+			if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
+							/*metric_no_group=*/false,
+							/*metric_no_merge=*/false,
+							/*metric_no_threshold=*/true,
+							stat_config.user_requested_cpu_list,
+							stat_config.system_wide,
+							&stat_config.metric_events) < 0)
+				return -1;
+
+			evlist__for_each_entry(metric_evlist, metric_evsel) {
+				metric_evsel->skippable = true;
+			}
+			evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
+			evlist__delete(metric_evlist);
+		}
+
 		/* Platform specific attrs */
 		if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
 			return -1;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 356c07f03be6..1cd04b5998d2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
 	evsel->per_pkg_mask  = NULL;
 	evsel->collect_stat  = false;
 	evsel->pmu_name      = NULL;
+	evsel->skippable     = false;
 }
 
 struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) @@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
 		return -1;
 
 	fd = FD(leader, cpu_map_idx, thread);
-	BUG_ON(fd == -1);
+	BUG_ON(fd == -1 && !leader->skippable);
 
-	return fd;
+	/*
+	 * When the leader has been skipped, return -2 to distinguish from no
+	 * group leader case.
+	 */
+	return fd == -1 ? -2 : fd;
 }
 
 static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx) @@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 			group_fd = get_group_fd(evsel, idx, thread);
 
+			if (group_fd == -2) {
+				pr_debug("broken group leader for %s\n", evsel->name);
+				err = -EINVAL;
+				goto out_close;
+			}
+
 			test_attr__ready();
 
 			/* Debug message used by test scripts */ diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 35805dcdb1b9..bf8f01af1c0b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -95,6 +95,7 @@ struct evsel {
 		bool			weak_group;
 		bool			bpf_counter;
 		bool			use_config_name;
+		bool			skippable;
 		int			bpf_fd;
 		struct bpf_object	*bpf_obj;
 		struct list_head	config_terms;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index e6035ecbeee8..6b46bbb3d322 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -810,6 +810,10 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
 	struct perf_cpu cpu;
 	int idx;
 
+	/* Skip counters that were speculatively/default enabled rather than requested. */
+	if (counter->skippable)
+		return true;
+
 	/*
 	 * Skip value 0 when enabling --per-thread globally,
 	 * otherwise it will have too many 0 output.
--
2.40.1.495.gc816e09b53d-goog

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ