[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4d56a1c-3880-3ec4-17e7-91d2d428a94a@oracle.com>
Date: Fri, 14 Oct 2022 16:47:52 -0700
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: linux-perf-users <linux-perf-users@...r.kernel.org>,
KVM <kvm@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, joe.jin@...cle.com,
Like Xu <likexu@...cent.com>, kan.liang@...ux.intel.com
Subject: Re: [PATCH RFC 1/1] perf stat: do not fatal if the leader is errored
Hi Namhyung,
On 10/14/22 3:16 PM, Namhyung Kim wrote:
> Hello,
>
> On Thu, Sep 22, 2022 at 12:10 AM Dongli Zhang <dongli.zhang@...cle.com> wrote:
>>
>> Add kvm@...r.kernel.org as this issue is in virtualization env.
>>
>> The topdown metrics events became default since
>> commit 42641d6f4d15 ("perf stat: Add Topdown metrics events as default
>> events"). The perf will use 'slots' if the
>> /sys/bus/event_source/devices/cpu/events/slots is available.
>>
>> Unfortunately, the 'slots' may not be supported in the virualization
>> environment. The hypervisor may not expose the 'slots' counter to the VM
>> in cpuid. As a result, the kernel may disable topdown slots and metrics
>> events in intel_pmu_init() if slots event is not in CPUID. E.g., both
>> c->weight and c->idxmsk64 are set to 0.
>>
>> There will be below error on Icelake VM since 'slots' is the leader:
>>
>> $ perf stat
>> Error:
>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (slots).
>> /bin/dmesg | grep -i perf may provide additional information.
>>
>> This is because the stat_handle_error() returns COUNTER_FATAL when the
>> 'slots' is used as leader of events.
>>
>> There are three options to fix the issue.
>>
>> 1. Do not expose /sys/bus/event_source/devices/cpu/events/slots to
>> userspace so that pmu_have_event(pmu->name, "slots") returns false.
>>
>> 2. Run cpuid at perf userspace and avoid using 'slots' if it is not
>> supported in cpuid.
>>
>> 3. Do not fatal perf if the leader is failed. Do not create events for an
>> evsel if its leader is already failed.
>>
>> This RFC patch is with the 3rd option. Would you mind suggesting which
>> option is better?
>
> Sorry for the late reply but I think option 1 is the way to go.
>
> The option 3 would be a transient workaround and it would affect
> other events too. If it's really needed, I think option 2 is slightly better
> than option 3. Or, we can add --force option to skip non-supported
> events explicitly.
About option 2 and 3, I will wait for more comments if anyone still prefers the
change at userspace, e.g., the "--force" option to avoid failure but just to
report non-supported.
I have already sent a version that may impact other events. Please ignore as you
think option 1 is enough.
https://lore.kernel.org/all/20221010050113.13050-1-dongli.zhang@oracle.com/
About option 1, there is a bugfix from Ken pending for review.
https://lore.kernel.org/all/20220922201505.2721654-1-kan.liang@linux.intel.com/
Thank you very much!
Dongli Zhang
>
> Thanks,
> Namhyung
>
>>
>> Here is the output of patch.
>>
>> $ perf stat -v
>> Using CPUID GenuineIntel-6-6A-6
>> slots -> cpu/event=0,umask=0x4/
>> topdown-retiring -> cpu/event=0,umask=0x80/
>> topdown-bad-spec -> cpu/event=0,umask=0x81/
>> topdown-fe-bound -> cpu/event=0,umask=0x82/
>> topdown-be-bound -> cpu/event=0,umask=0x83/
>> Control descriptor is not initialized
>> Warning:
>> slots event is not supported by the kernel.
>> ^Ccpu-clock: 62021481051 62021480237 62021480237
>> context-switches: 437 62021478064 62021478064
>> cpu-migrations: 17 62021475294 62021475294
>> page-faults: 12 62021471925 62021471925
>> cycles: 15662273 62020909141 62020909141
>> instructions: 6580385 62008944246 62008944246
>> branches: 1446119 62008855550 62008855550
>> branch-misses: 30970 62008643255 62008643255
>> failed to read counter slots
>> failed to read counter topdown-retiring
>> failed to read counter topdown-bad-spec
>> failed to read counter topdown-fe-bound
>> failed to read counter topdown-be-bound
>>
>> Performance counter stats for 'system wide':
>>
>> 62,021.48 msec cpu-clock # 16.006 CPUs utilized
>> 437 context-switches # 7.046 /sec
>> 17 cpu-migrations # 0.274 /sec
>> 12 page-faults # 0.193 /sec
>> 15,662,273 cycles # 0.000 GHz
>> 6,580,385 instructions # 0.42 insn per cycle
>> 1,446,119 branches # 23.316 K/sec
>> 30,970 branch-misses # 2.14% of all branches
>> <not supported> slots
>> <not supported> topdown-retiring
>> <not supported> topdown-bad-spec
>> <not supported> topdown-fe-bound
>> <not supported> topdown-be-bound
>>
>> 3.874991326 seconds time elapsed
>>
>> Thank you very much!
>>
>> Cc: Joe Jin <joe.jin@...cle.com>
>> Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
>> ---
>> tools/perf/builtin-stat.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 0b4a62e4ff67..1053cf0886c0 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -762,9 +762,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
>> */
>> counter->errored = true;
>>
>> - if ((evsel__leader(counter) != counter) ||
>> - !(counter->core.leader->nr_members > 1))
>> - return COUNTER_SKIP;
>> + return COUNTER_SKIP;
>> } else if (evsel__fallback(counter, errno, msg, sizeof(msg))) {
>> if (verbose > 0)
>> ui__warning("%s\n", msg);
>> @@ -843,6 +841,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>> if (target.use_bpf)
>> break;
>>
>> + if (evsel__leader(counter)->errored)
>> + continue;
>> if (counter->reset_group || counter->errored)
>> continue;
>> if (evsel__is_bpf(counter))
>> @@ -901,6 +901,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>> evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) {
>> counter = evlist_cpu_itr.evsel;
>>
>> + if (evsel__leader(counter)->errored)
>> + continue;
>> if (!counter->reset_group && !counter->errored)
>> continue;
>> if (!counter->reset_group)
>> --
>> 2.17.1
>>
Powered by blists - more mailing lists