linux-kernel - Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6871e91f-997b-8558-e87b-7bf147f2750c@amd.com>
Date:   Wed, 1 Feb 2023 12:21:32 +0530
From:   Ravi Bangoria <ravi.bangoria@....com>
To:     Ian Rogers <irogers@...gle.com>,
        "Liang, Kan" <kan.liang@...ux.intel.com>,
        "Xing, Zhengjun" <zhengjun.xing@...el.com>, sedat.dilek@...il.com
Cc:     Arnaldo Carvalho de Melo <acme@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Nathan Chancellor <natechancellor@...il.com>,
        llvm@...ts.linux.dev, Ben Hutchings <benh@...ian.org>,
        James Clark <james.clark@....com>,
        Stephane Eranian <eranian@...gle.com>,
        Ravi Bangoria <ravi.bangoria@....com>
Subject: Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED!

Hi Ian,

> So I think this is a kernel bug triggering a perf tool bug. The kernel
> bug can be worked around in the perf tool. I only had an Ivybridge to
> test with (hence slightly different events) but what I see is both
> tma_dram_bound and tma_l3_bound using the same 4 events. I could work
> around the "<not counted>" by adding the --metric-no-group flag:
> 
> ```
> $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1
> 
> Performance counter stats for 'system wide':
> 
>           400,404      MEM_LOAD_UOPS_RETIRED.LLC_HIT    #      4.3 %
> tma_l3_bound             (74.99%)
>       128,937,891      CYCLE_ACTIVITY.STALLS_L2_PENDING
>                         (87.46%)
>           167,459      MEM_LOAD_UOPS_RETIRED.LLC_MISS
>                         (74.99%)
>       759,574,967      CPU_CLK_UNHALTED.THREAD
>                         (87.47%)
> 
>       1.001526438 seconds time elapsed
> 
> $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1
> 
> Performance counter stats for 'system wide':
> 
>           259,954      MEM_LOAD_UOPS_RETIRED.LLC_HIT    #     15.2 %
> tma_dram_bound           (74.99%)
>       118,807,043      CYCLE_ACTIVITY.STALLS_L2_PENDING
>                         (87.46%)
>           111,699      MEM_LOAD_UOPS_RETIRED.LLC_MISS
>                         (74.95%)
>       587,571,060      CPU_CLK_UNHALTED.THREAD
>                         (87.45%)
> 
>       1.001518093 seconds time elapsed
> ```
> 
> The issue is that perf metrics use weak groups of events. A weak group
> is the same as a group of events initially. We want to use groups of
> events with metrics so that all the counters are scheduled in and out
> at the same time, and not multiplexed independently. Imagine measuring
> IPC but the counts for instructions and cycles are measured at
> different periods, the resultant IPC value would be unlikely to be
> accurate. If perf_event_open fails then the perf tool retries the
> events without the group. If I try just 3 of the events in a weak
> group then the failure can be seen:
> 
> ```
> $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W"
> -a sleep 1
> 
> Performance counter stats for 'system wide':
> 
>     <not counted>      MEM_LOAD_UOPS_RETIRED.LLC_HIT
>                         (0.00%)
>     <not counted>      MEM_LOAD_UOPS_RETIRED.LLC_MISS
>                         (0.00%)
>     <not counted>      CYCLE_ACTIVITY.STALLS_L2_PENDING
>                         (0.00%)
> 
>       1.001458485 seconds time elapsed
> ```
> 
> The kernel should have failed the perf_event_open on opening the third
> event and then measured without the group,

IIUC, Kernel should not fail opening of the 3rd event, because there are 4
general purpose counters on Intel and all three events can be scheduled
on any of the 4 counter (I checked IvyBridge).

However, what I don't understand is why kernel failed to schedule the group.
Unless someone has pre-occupied 2 or more GP counter, group should get
schedule fine.

> which it can do with
> multiplexing as in the following:
> 
> ```
> $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING"
> -a sleep 1
> 
> Performance counter stats for 'system wide':
> 
>         1,239,397      MEM_LOAD_UOPS_RETIRED.LLC_HIT
>                         (79.06%)
>           174,826      MEM_LOAD_UOPS_RETIRED.LLC_MISS
>                         (64.60%)
>       124,026,024      CYCLE_ACTIVITY.STALLS_L2_PENDING
>                         (81.16%)
> 
>       1.001483434 seconds time elapsed
> ```
> 
> When the --metric-no-group flag is given to perf then it doesn't
> produce the initial weak group, which works around the bug of the
> kernel not failing on the 3rd perf_event_open. I've added Kan and
> Zhengjun to the e-mail as they work on the Intel kernel PMU code.
> 
> There's a question about what we should do in the perf test about
> this? I have a few solutions:
> 
> 1) try metric tests again with the --metric-no-group flag and don't
> fail the test if this succeeds. This allows kernel bugs to hide, so
> I'm not a huge fan.
> 
> 2) add a new metric flag/constraint to say not to group, this way the
> metric will automatically apply the "--metric-no-group" flag. It is a
> bit of work to wire this up but this kind of failure is common enough
> in PMUs that it is probably worthwhile. We also need to add the flag
> to metrics and I'm not sure how to get a good list of the metrics that
> currently fail and require it. This is okay but error prone.
> 
> 3) fix the kernel bug and let the perf test fail until an adequate
> kernel is installed. Probably the best option.

Thanks,
Ravi