linux-kernel - [PATCH RFC 1/1] perf stat: do not fatal if the leader is errored

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220922071017.17398-1-dongli.zhang@oracle.com>
Date:   Thu, 22 Sep 2022 00:10:17 -0700
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     linux-perf-users@...r.kernel.org, kvm@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, peterz@...radead.org,
        mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
        alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
        namhyung@...nel.org, joe.jin@...cle.com, likexu@...cent.com
Subject: [PATCH RFC 1/1] perf stat: do not fatal if the leader is errored

Add kvm@...r.kernel.org as this issue is in virtualization env.

The topdown metrics events became default since
commit 42641d6f4d15 ("perf stat: Add Topdown metrics events as default
events"). The perf will use 'slots' if the
/sys/bus/event_source/devices/cpu/events/slots is available.

Unfortunately, the 'slots' may not be supported in the virualization
environment. The hypervisor may not expose the 'slots' counter to the VM
in cpuid. As a result, the kernel may disable topdown slots and metrics
events in intel_pmu_init() if slots event is not in CPUID. E.g., both
c->weight and c->idxmsk64 are set to 0.

There will be below error on Icelake VM since 'slots' is the leader:

$ perf stat
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (slots).
/bin/dmesg | grep -i perf may provide additional information.

This is because the stat_handle_error() returns COUNTER_FATAL when the
'slots' is used as leader of events.

There are three options to fix the issue.

1. Do not expose /sys/bus/event_source/devices/cpu/events/slots to
userspace so that pmu_have_event(pmu->name, "slots") returns false.

2. Run cpuid at perf userspace and avoid using 'slots' if it is not
supported in cpuid.

3. Do not fatal perf if the leader is failed. Do not create events for an
evsel if its leader is already failed.

This RFC patch is with the 3rd option. Would you mind suggesting which
option is better?

Here is the output of patch.

$ perf stat -v
Using CPUID GenuineIntel-6-6A-6
slots -> cpu/event=0,umask=0x4/
topdown-retiring -> cpu/event=0,umask=0x80/
topdown-bad-spec -> cpu/event=0,umask=0x81/
topdown-fe-bound -> cpu/event=0,umask=0x82/
topdown-be-bound -> cpu/event=0,umask=0x83/
Control descriptor is not initialized
Warning:
slots event is not supported by the kernel.
^Ccpu-clock: 62021481051 62021480237 62021480237
context-switches: 437 62021478064 62021478064
cpu-migrations: 17 62021475294 62021475294
page-faults: 12 62021471925 62021471925
cycles: 15662273 62020909141 62020909141
instructions: 6580385 62008944246 62008944246
branches: 1446119 62008855550 62008855550
branch-misses: 30970 62008643255 62008643255
failed to read counter slots
failed to read counter topdown-retiring
failed to read counter topdown-bad-spec
failed to read counter topdown-fe-bound
failed to read counter topdown-be-bound

 Performance counter stats for 'system wide':

         62,021.48 msec cpu-clock                        #   16.006 CPUs utilized          
               437      context-switches                 #    7.046 /sec                   
                17      cpu-migrations                   #    0.274 /sec                   
                12      page-faults                      #    0.193 /sec                   
        15,662,273      cycles                           #    0.000 GHz                    
         6,580,385      instructions                     #    0.42  insn per cycle         
         1,446,119      branches                         #   23.316 K/sec                  
            30,970      branch-misses                    #    2.14% of all branches        
   <not supported>      slots                                                       
   <not supported>      topdown-retiring                                            
   <not supported>      topdown-bad-spec                                            
   <not supported>      topdown-fe-bound                                            
   <not supported>      topdown-be-bound                                            

       3.874991326 seconds time elapsed

Thank you very much!

Cc: Joe Jin <joe.jin@...cle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
---
 tools/perf/builtin-stat.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0b4a62e4ff67..1053cf0886c0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -762,9 +762,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
 		 */
 		counter->errored = true;
 
-		if ((evsel__leader(counter) != counter) ||
-		    !(counter->core.leader->nr_members > 1))
-			return COUNTER_SKIP;
+		return COUNTER_SKIP;
 	} else if (evsel__fallback(counter, errno, msg, sizeof(msg))) {
 		if (verbose > 0)
 			ui__warning("%s\n", msg);
@@ -843,6 +841,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		if (target.use_bpf)
 			break;
 
+		if (evsel__leader(counter)->errored)
+			continue;
 		if (counter->reset_group || counter->errored)
 			continue;
 		if (evsel__is_bpf(counter))
@@ -901,6 +901,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) {
 			counter = evlist_cpu_itr.evsel;
 
+			if (evsel__leader(counter)->errored)
+				continue;
 			if (!counter->reset_group && !counter->errored)
 				continue;
 			if (!counter->reset_group)
-- 
2.17.1