lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Mar 2022 16:53:46 +0100
From:   Thomas Richter <tmricht@...ux.ibm.com>
To:     linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        acme@...nel.org, namhyung@...nel.org, gor@...ux.ibm.com,
        acme@...hat.com
Cc:     svens@...ux.ibm.com, sumanthk@...ux.ibm.com, hca@...ux.ibm.com,
        Thomas Richter <tmricht@...ux.ibm.com>
Subject: [PATCH] perf/stat: Fix perf stat for forked applications

I have run into the following issue:

 # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7

 Performance counter stats for 'system wide':

                 0      new_pmu/INSTRUCTION_7/

       0.000366428 seconds time elapsed
 #

The new PMU for s390 counts the execution of certain CPU instructions.
The root cause is the extremely small run time of the
mytest program. It just executes some assembly instructions
and then exits. In above invocation the instruction is executed
exactly one time (-c1 option). The PMU is expected to report this one
time execution by a counter value of one, but fails to do so
in some cases, not all.

Debugging reveals the invocation of the child process is done
*before* the counter events are installed and enabled. Tracing
reveals that sometimes the child process starts and exits before
the event is installed on all CPUs. The more CPUs the machine has,
the more often this miscount happens.

Fix this by reversing the start of the work load after the events
have been installed on the specified CPUs. Now the comment also
matches the code.

Output after:
 # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7

 Performance counter stats for 'system wide':

                 1      new_pmu/INSTRUCTION_7/

       0.000366428 seconds time elapsed
 #

Now the correct result is reported rock solid all the time regardless
how many CPUs are online.

Fixes:  acf2892270dc ("perf stat: Use perf_evlist__prepare/start_workload())

Signed-off-by: Thomas Richter <tmricht@...ux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@...ux.ibm.com>
Cc: Namhyung Kim <namhyung@...nel.org>
---
 tools/perf/builtin-stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3f98689dd687..60baa3dadc4b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -955,10 +955,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	 * Enable counters and exec the command:
 	 */
 	if (forks) {
-		evlist__start_workload(evsel_list);
 		err = enable_counters();
 		if (err)
 			return -1;
+		evlist__start_workload(evsel_list);
 
 		t0 = rdclock();
 		clock_gettime(CLOCK_MONOTONIC, &ref_time);
-- 
2.35.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ