[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fVQc_FioO2QAGrW2B7QMQN8TyD1_Ns=rMNxmGQ9hhPnYQ@mail.gmail.com>
Date: Mon, 10 Jul 2023 12:13:36 -0700
From: Ian Rogers <irogers@...gle.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
linux-kernel@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Ahmad Yasin <ahmad.yasin@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
Caleb Biggers <caleb.biggers@...el.com>,
Edward Baker <edward.baker@...el.com>,
Florian Fischer <florian.fischer@...q.space>,
Ingo Molnar <mingo@...hat.com>,
James Clark <james.clark@....com>,
Jiri Olsa <jolsa@...nel.org>,
John Garry <john.g.garry@...cle.com>,
Kajol Jain <kjain@...ux.ibm.com>,
Kang Minchul <tegongkang@...il.com>,
Leo Yan <leo.yan@...aro.org>,
Mark Rutland <mark.rutland@....com>,
Namhyung Kim <namhyung@...nel.org>,
Perry Taylor <perry.taylor@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ravi Bangoria <ravi.bangoria@....com>,
Rob Herring <robh@...nel.org>,
Samantha Alt <samantha.alt@...el.com>,
Stephane Eranian <eranian@...gle.com>,
Sumanth Korikkar <sumanthk@...ux.ibm.com>,
Suzuki Poulouse <suzuki.poulose@....com>,
Thomas Richter <tmricht@...ux.ibm.com>,
Tiezhu Yang <yangtiezhu@...ngson.cn>,
Weilin Wang <weilin.wang@...el.com>,
Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
Yang Jihong <yangjihong1@...wei.com>,
linux-perf-users@...r.kernel.org
Subject: Re: [linus:master] [perf parse] 70c90e4a6b: perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail
Hi and thanks for the report, I'm confused by the output. Specifically:
Direct leak of 17544 byte(s) in 51 object(s) allocated from:
#0 0x7f49ee50c037 in __interceptor_calloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x556656895a6b in map__new2 util/map.c:226
#2 0x55665687a6ac in machine__addnew_module_map util/machine.c:1039
#3 0x556656880bfa in machine__process_kernel_mmap_event util/machine.c:1809
#4 0x556656882eb7 in machine__process_mmap_event util/machine.c:1996
#5 0x5566567426bd in perf_event__process_mmap util/event.c:370
#6 0x5566568b3536 in machines__deliver_event util/session.c:1565
#7 0x5566568b4e16 in perf_session__deliver_event util/session.c:1645
#8 0x5566568b7ea1 in perf_session__process_event util/session.c:1881
#9 0x5566568bed4d in process_simple util/session.c:2442
#10 0x5566568bdd9d in reader__read_event util/session.c:2371
#11 0x5566568be6dd in reader__process_events util/session.c:2420
#12 0x5566568bf506 in __perf_session__process_events util/session.c:2467
#13 0x5566568c243e in perf_session__process_events util/session.c:2633
#14 0x5566563ff7d9 in __cmd_report
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/builtin-report.c:989
#15 0x55665640be73 in cmd_report
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/builtin-report.c:1709
#16 0x5566566e0d7f in run_builtin
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:323
#17 0x5566566e1601 in handle_internal_command
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:377
#18 0x5566566e1b33 in run_argv
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:421
#19 0x5566566e225f in main
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:537
#20 0x7f49ed6b3d09 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
It shows a map being leaked but without the reference count checker
being enabled, which shouldn't happen given:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/lib/perf/include/internal/rc_check.h#n12
Trying to look further, the blamed line is a closing curly for a function:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/perf/util/machine.c#n1039
As such I'm not sure there is anything actionable here and I suspect
the underlying issues were fixed with the numerous reference count
checker fixes to the perf tool.
Thanks,
Ian
On Sun, Jul 9, 2023 at 8:10 PM kernel test robot <oliver.sang@...el.com> wrote:
>
>
> hi Ian Rogers,
>
> when we reported
> "[linux-next:master] [perf parse] 70c90e4a6b: perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail"
> on
> https://lore.kernel.org/all/202306161546.17ace7b9-oliver.sang@intel.com/
> when this commit is still on linus-next, you mentioned it should be fixed by
> https://lore.kernel.org/r/20230608232823.4027869-20-irogers@google.com
> which we noticed is already on mainline now.
> "1981da1fe2499 perf machine: Don't leak module maps"
>
> now we noticed the commit is on mainline already, and the issues seem still
> exist. we also tested on latest linus/master linux-next/master when this bisect
> done, which we confirmed both include 1981da1fe2499. but we found the tests
> still failed. so we send this report again FYI.
>
>
> Hello,
>
> kernel test robot noticed "perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail" on:
>
> commit: 70c90e4a6b2fbe775b662eafefae51f64d627790 ("perf parse-events: Avoid scanning PMUs before parsing")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master 1c7873e3364570ec89343ff4877e0f27a7b21a61]
> [test failed on linux-next/master 123212f53f3e394c1ae69a58c05dfdda56fec8c6]
>
> in testcase: perf-test
> version: perf-test-x86_64-git-1_20220520
> with following parameters:
>
> type: lkp
> group: group-00
>
> test-description: The internal Perf Test suite.
>
>
> compiler: gcc-12
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202307101059.86ea1eac-oliver.sang@intel.com
>
>
> besides, we also noticed several other cases will fail on this commit but pass
> on parent:
>
> 442eeb77044705f2 70c90e4a6b2fbe775b662eafefa
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1.fail
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_instructions_k_HAS_FIX_NO_NMI_R1.fail
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_instructions_k_NO_FIX_HAS_NMI_R1.fail
>
>
>
> 28 test cases pass for perf_hw_event_sample_group test. 4 test cases fail for perf_hw_event_sample_group test.
> Test Case sampe_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cache-misses_instructions_u_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_instructions_k_NO_FIX_HAS_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_and_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case sampe_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cache-misses_instructions_u_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_instructions_k_HAS_FIX_NO_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_and_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case sampe_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_u_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_k_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_and_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case sampe_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_u_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_k_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_and_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> perf hardware cache event sample group test
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>
Powered by blists - more mailing lists