lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZXC1U8y4JAUaQ6lm@kernel.org>
Date:   Wed, 6 Dec 2023 14:54:27 -0300
From:   Arnaldo Carvalho de Melo <acme@...nel.org>
To:     Ian Rogers <irogers@...gle.com>
Cc:     Ayush Jain <ayush.jain3@....com>,
        Sandipan Das <sandipan.das@....com>,
        linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        peterz@...radead.org, Ingo Molnar <mingo@...nel.org>,
        mark.rutland@....com, alexander.shishkin@...ux.intel.com,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>, kjain@...ux.ibm.com,
        atrajeev@...ux.vnet.ibm.com, barnali@...ux.ibm.com,
        ananth.narayan@....com, ravi.bangoria@....com,
        santosh.shukla@....com
Subject: Re: [PATCH] perf test: Retry without grouping for all metrics test

Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> > Humm, I'm not being able to reproduce here the problem, before applying
> > this patch:
 
> Please don't apply the patch. The patch masks a bug in metrics/PMUs

I didn't

> and the proper fix was:
> 8d40f74ebf21 perf vendor events amd: Fix large metrics
> https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com

that is upstream:

⬢[acme@...lbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
Author: Sandipan Das <sandipan.das@....com>
Date:   Thu Jul 6 12:04:40 2023 +0530

    perf vendor events amd: Fix large metrics

    There are cases where a metric requires more events than the number of
    available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
    data fabric counters but the "nps1_die_to_dram" metric has eight events.

    By default, the constituent events are placed in a group and since the
    events cannot be scheduled at the same time, the metric is not computed.
    The "all metrics" test also fails because of this.

    Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
    the user to run perf with "--metric-no-group".

    E.g.

      $ sudo perf test -v 101

    Before:

      101: perf all metrics test                                           :
      --- start ---
      test child forked, pid 37131
      Testing branch_misprediction_ratio
      Testing all_remote_links_outbound
      Testing nps1_die_to_dram
      Metric 'nps1_die_to_dram' not printed in:
      Error:
      Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
      Testing macro_ops_dispatched
      Testing all_l2_cache_accesses
      Testing all_l2_cache_hits
      Testing all_l2_cache_misses
      Testing ic_fetch_miss_ratio
      Testing l2_cache_accesses_from_l2_hwpf
      Testing l2_cache_misses_from_l2_hwpf
      Testing op_cache_fetch_miss_ratio
      Testing l3_read_miss_latency
      Testing l1_itlb_misses
      test child finished with -1
      ---- end ----
      perf all metrics test: FAILED!

    After:

      101: perf all metrics test                                           :
      --- start ---
      test child forked, pid 43766
      Testing branch_misprediction_ratio
      Testing all_remote_links_outbound
      Testing nps1_die_to_dram
      Testing macro_ops_dispatched
      Testing all_l2_cache_accesses
      Testing all_l2_cache_hits
      Testing all_l2_cache_misses
      Testing ic_fetch_miss_ratio
      Testing l2_cache_accesses_from_l2_hwpf
      Testing l2_cache_misses_from_l2_hwpf
      Testing op_cache_fetch_miss_ratio
      Testing l3_read_miss_latency
      Testing l1_itlb_misses
      test child finished with 0
      ---- end ----
      perf all metrics test: Ok

    Reported-by: Ayush Jain <ayush.jain3@....com>
    Suggested-by: Ian Rogers <irogers@...gle.com>
    Signed-off-by: Sandipan Das <sandipan.das@....com>
    Acked-by: Ian Rogers <irogers@...gle.com>
    Cc: Adrian Hunter <adrian.hunter@...el.com>
    Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
    Cc: Ananth Narayan <ananth.narayan@....com>
    Cc: Ingo Molnar <mingo@...hat.com>
    Cc: Jiri Olsa <jolsa@...nel.org>
    Cc: Mark Rutland <mark.rutland@....com>
    Cc: Namhyung Kim <namhyung@...nel.org>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: Ravi Bangoria <ravi.bangoria@....com>
    Cc: Santosh Shukla <santosh.shukla@....com>
    Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com
 
> > Ian, I also stumbled on this:

> > [root@...e ~]# perf stat -M dram_channel_data_controller_4
> > Cannot find metric or group `dram_channel_data_controller_4'
> > ^C
> >  Performance counter stats for 'system wide':

> >         284,908.91 msec cpu-clock                        #   32.002 CPUs utilized
> >          6,485,456      context-switches                 #   22.763 K/sec
> >                719      cpu-migrations                   #    2.524 /sec
> >             32,800      page-faults                      #  115.125 /sec

<SNIP>

> > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?

> We could. I suspect the code has always just not bailed out. I'll put
> together a patch adding the bail out.

Great, thanks,

- Arnaldo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ