[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6871e91f-997b-8558-e87b-7bf147f2750c@amd.com>
Date: Wed, 1 Feb 2023 12:21:32 +0530
From: Ravi Bangoria <ravi.bangoria@....com>
To: Ian Rogers <irogers@...gle.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
"Xing, Zhengjun" <zhengjun.xing@...el.com>, sedat.dilek@...il.com
Cc: Arnaldo Carvalho de Melo <acme@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Nick Desaulniers <ndesaulniers@...gle.com>,
Nathan Chancellor <natechancellor@...il.com>,
llvm@...ts.linux.dev, Ben Hutchings <benh@...ian.org>,
James Clark <james.clark@....com>,
Stephane Eranian <eranian@...gle.com>,
Ravi Bangoria <ravi.bangoria@....com>
Subject: Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED!
Hi Ian,
> So I think this is a kernel bug triggering a perf tool bug. The kernel
> bug can be worked around in the perf tool. I only had an Ivybridge to
> test with (hence slightly different events) but what I see is both
> tma_dram_bound and tma_l3_bound using the same 4 events. I could work
> around the "<not counted>" by adding the --metric-no-group flag:
>
> ```
> $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 %
> tma_l3_bound (74.99%)
> 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING
> (87.46%)
> 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS
> (74.99%)
> 759,574,967 CPU_CLK_UNHALTED.THREAD
> (87.47%)
>
> 1.001526438 seconds time elapsed
>
> $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1
>
> Performance counter stats for 'system wide':
>
> 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 %
> tma_dram_bound (74.99%)
> 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING
> (87.46%)
> 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS
> (74.95%)
> 587,571,060 CPU_CLK_UNHALTED.THREAD
> (87.45%)
>
> 1.001518093 seconds time elapsed
> ```
>
> The issue is that perf metrics use weak groups of events. A weak group
> is the same as a group of events initially. We want to use groups of
> events with metrics so that all the counters are scheduled in and out
> at the same time, and not multiplexed independently. Imagine measuring
> IPC but the counts for instructions and cycles are measured at
> different periods, the resultant IPC value would be unlikely to be
> accurate. If perf_event_open fails then the perf tool retries the
> events without the group. If I try just 3 of the events in a weak
> group then the failure can be seen:
>
> ```
> $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W"
> -a sleep 1
>
> Performance counter stats for 'system wide':
>
> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT
> (0.00%)
> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS
> (0.00%)
> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING
> (0.00%)
>
> 1.001458485 seconds time elapsed
> ```
>
> The kernel should have failed the perf_event_open on opening the third
> event and then measured without the group,
IIUC, Kernel should not fail opening of the 3rd event, because there are 4
general purpose counters on Intel and all three events can be scheduled
on any of the 4 counter (I checked IvyBridge).
However, what I don't understand is why kernel failed to schedule the group.
Unless someone has pre-occupied 2 or more GP counter, group should get
schedule fine.
> which it can do with
> multiplexing as in the following:
>
> ```
> $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING"
> -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT
> (79.06%)
> 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS
> (64.60%)
> 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING
> (81.16%)
>
> 1.001483434 seconds time elapsed
> ```
>
> When the --metric-no-group flag is given to perf then it doesn't
> produce the initial weak group, which works around the bug of the
> kernel not failing on the 3rd perf_event_open. I've added Kan and
> Zhengjun to the e-mail as they work on the Intel kernel PMU code.
>
> There's a question about what we should do in the perf test about
> this? I have a few solutions:
>
> 1) try metric tests again with the --metric-no-group flag and don't
> fail the test if this succeeds. This allows kernel bugs to hide, so
> I'm not a huge fan.
>
> 2) add a new metric flag/constraint to say not to group, this way the
> metric will automatically apply the "--metric-no-group" flag. It is a
> bit of work to wire this up but this kind of failure is common enough
> in PMUs that it is probably worthwhile. We also need to add the flag
> to metrics and I'm not sure how to get a good list of the metrics that
> currently fail and require it. This is okay but error prone.
>
> 3) fix the kernel bug and let the perf test fail until an adequate
> kernel is installed. Probably the best option.
Thanks,
Ravi
Powered by blists - more mailing lists