linux-kernel - Re: [PATCH v1] perf x86 test: Update hybrid expectations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fWP-bb_Uv+Ev1URg6jGfXFRWG19OqEmbo3VKwe4dRreSA@mail.gmail.com>
Date: Wed, 3 Jan 2024 09:17:18 -0800
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Namhyung Kim <namhyung@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>, 
	Kan Liang <kan.liang@...ux.intel.com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] perf x86 test: Update hybrid expectations

On Wed, Jan 3, 2024 at 8:42 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> Em Tue, Jan 02, 2024 at 01:57:32PM -0800, Ian Rogers escreveu:
> > The legacy events cpu-cycles and instructions have sysfs event
> > equivalents on x86 (see /sys/devices/cpu_core/events). As sysfs/JSON
> > events are now higher in priority than legacy events this causes the
> > hybrid test expectations not to be met. To fix this switch to legacy
> > events that don't have sysfs versions, namely cpu-cycles becomes
> > cycles and instructions becomes branches.
> >
> > Fixes: a24d9d9dc096 ("perf parse-events: Make legacy events lower priority than sysfs/JSON")
> > Signed-off-by: Ian Rogers <irogers@...gle.com>
>
> With it:
>
> root@...ber:/home/acme# perf test hybrid
>  71: Intel PT                                                        :
>  71.2: Intel PT hybrid CPU compatibility                             : Ok
>  75: x86 hybrid                                                      : Ok
> root@...ber:/home/acme#
>
> Applied.
>
> Now to look at this on this hybrid system (14700K):
>
> 101: perf all metricgroups test                                      : FAILED!
>
> Testing Mem
> event syntax error: '{cpu_core/UNC_ARB_DAT_OCCUPANCY.RD,cmask=1,metric-id=cpu_core!3UNC_ARB_DAT_OCCUPANCY.RD!0cmask!21!3/,UNC_ARB_DAT_OCCUPANCY.RD/metric-id=UNC_ARB_DAT_OCCUPANCY.RD/}:W,du..'
>                                \___ Bad event or PMU
>
> Unable to find PMU or event on a PMU of 'cpu_core'
>
> Initial error:
> event syntax error: '{cpu_core/UNC_ARB_DAT_OCCUPANCY.RD,cmask=1,metric-id=cpu_core!3UNC_ARB_DAT_OCCUPANCY.RD!0cmask!21!3/,UNC_ARB_DAT_OCCUPANCY.RD/metric-id=UNC_ARB_DAT_OCCUPANCY.RD/}:W,du..'
>                                \___ unknown term 'UNC_ARB_DAT_OCCUPANCY.RD' for pmu 'cpu_core'
>
> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,config3,name,period,percore,metric-id
> test child finished with -1
> ---- end ----
> perf all metricgroups test: FAILED!
> root@...ber:/home/acme# grep -m1 "model name" /proc/cpuinfo
> model name      : Intel(R) Core(TM) i7-14700K
> root@...ber:/home/acme#
>
>
> root@...ber:/home/acme# ls -la /sys/devices/uncore_
> uncore_arb_0/              uncore_cbox_1/             uncore_cbox_2/             uncore_cbox_5/             uncore_cbox_8/             uncore_imc_0/              uncore_imc_free_running_1/
> uncore_arb_1/              uncore_cbox_10/            uncore_cbox_3/             uncore_cbox_6/             uncore_cbox_9/             uncore_imc_1/
> uncore_cbox_0/             uncore_cbox_11/            uncore_cbox_4/             uncore_cbox_7/             uncore_clock/              uncore_imc_free_running_0/
> root@...ber:/home/acme# ls -la /sys/devices/uncore_
>
>
> 102: perf all metrics test                                           : FAILED!
>
> event syntax error: '{cpu_core/UNC_ARB_DAT_OCCUPANCY.RD,cmask=1,metric-id=cpu..'
>                                \___ Bad event or PMU
>
> Unable to find PMU or event on a PMU of 'cpu_core'
>
> Initial error:
> event syntax error: '{cpu_core/UNC_ARB_DAT_OCCUPANCY.RD,cmask=1,metric-id=cpu..'
>                                \___ unknown term 'UNC_ARB_DAT_OCCUPANCY.RD' for pmu 'cpu_core'
>
> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,config3,name,period,percore,metric-id

I'll take a look. UNC_ARB* events are going to be using uncore_arb_*
PMUs and so the cpu_core PMU shouldn't be specified. This looks like a
bug in how the metric is generated.

> Testing UNCORE_FREQ
> Metric 'UNCORE_FREQ' not printed in:
> event syntax error: '{tma_info_system_socket_clks/metric-id=tma_info_system_s..'
>                       \___ Bad event or PMU
>
> Unable to find PMU or event on a PMU of 'tma_info_system_socket_clks'
>
> Initial error:
> event syntax error: '{tma_info_system_socket_clks/metric-id=tma_info_system_s..'
>                       \___ Cannot find PMU `tma_info_system_socket_clks'. Missing kernel support?
> Testing tma_info_system_socket_clks

Similar bug but different as differing PMUs aren't involved:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L1459

I also see what may be a PMU driver bug in:
```
...
Metric 'tma_slow_pause' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
 Average synthesis took: 11.657 usec (+- 0.039 usec)
 Average num. events: 4.000 (+- 0.000)
 Average time per event 2.914 usec
 Average data synthesis took: 11.832 usec (+- 0.037 usec)
 Average num. events: 13.000 (+- 0.000)
 Average time per event 0.910 usec

Performance counter stats for 'perf bench internals synthesize':

    <not counted>      cpu_core/TOPDOWN.SLOTS/
                        (0.00%)
    <not counted>      cpu_core/topdown-retiring/
                        (0.00%)
    <not counted>      cpu_core/topdown-mem-bound/
                        (0.00%)
    <not counted>      cpu_core/topdown-bad-spec/
                        (0.00%)
    <not counted>      cpu_core/topdown-fe-bound/
                        (0.00%)
    <not counted>      cpu_core/topdown-be-bound/
                        (0.00%)
    <not counted>      cpu_core/RESOURCE_STALLS.SCOREBOARD/
                            (0.00%)
    <not counted>      cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/
                           (0.00%)
    <not counted>      cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/
                             (0.00%)
    <not counted>      cpu_core/CPU_CLK_UNHALTED.PAUSE/
                        (0.00%)
    <not counted>      cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/
                             (0.00%)
    <not counted>      cpu_core/CPU_CLK_UNHALTED.THREAD/
                         (0.00%)
    <not counted>      cpu_core/ARITH.DIV_ACTIVE/
                        (0.00%)
    <not counted>      cpu_core/EXE_ACTIVITY.2_PORTS_UTIL,umask=0xc/
                                     (0.00%)
    <not counted>      cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/
                                      (0.00%)

      0.327060340 seconds time elapsed

      0.114906000 seconds user
      0.210001000 seconds sys
...
```

as adding --metric-no-group fixes the issue. Adding --metric-no-group
shouldn't be necessary as perf_event_open should be failing causing
the weak group to break (hence the possible PMU driver bug). Perhaps
there is something erroneous in weak group breaking on hybrid.

Thanks,
Ian

> - Arnaldo