[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fV2592hai4wGhOrCBYFwTxDnre7wdJwBRHgdPf6KXYfTA@mail.gmail.com>
Date: Thu, 20 Apr 2023 23:40:25 -0700
From: Ian Rogers <irogers@...gle.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
linux-perf-users@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Alexey Bayduraev <alexey.v.bayduraev@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Darren Hart <dvhart@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>,
Dmitriy Vyukov <dvyukov@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
German Gomez <german.gomez@....com>,
Hao Luo <haoluo@...gle.com>, Ingo Molnar <mingo@...hat.com>,
James Clark <james.clark@....com>,
Jiri Olsa <jolsa@...nel.org>,
John Garry <john.g.garry@...cle.com>,
Kajol Jain <kjain@...ux.ibm.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Leo Yan <leo.yan@...aro.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Mark Rutland <mark.rutland@....com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Miaoqian Lin <linmq006@...il.com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Riccardo Mancini <rickyman7@...il.com>,
Shunsuke Nakamura <nakamura.shun@...itsu.com>,
Song Liu <song@...nel.org>,
Stephane Eranian <eranian@...gle.com>,
Stephen Brennan <stephen.s.brennan@...cle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Thomas Richter <tmricht@...ux.ibm.com>,
Yury Norov <yury.norov@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [acme:tmp.perf/core] [perf map] ec417ad4c6: perf-sanity-tests.Test_dwarf_unwind.fail
On Thu, Apr 20, 2023 at 11:02 PM kernel test robot
<oliver.sang@...el.com> wrote:
>
>
> Hello,
>
> kernel test robot noticed "perf-sanity-tests.Test_dwarf_unwind.fail" on:
>
> commit: ec417ad4c691b5d90ab13cf26789e8719468ae39 ("perf map: Changes to reference counting")
> https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git tmp.perf/core
>
> [test failed on linux-next/master 44bf136283e567b2b62653be7630e7511da41da2]
>
> in testcase: perf-sanity-tests
> version: perf-x86_64-00c7b5f4ddc5-1_20230402
> with following parameters:
>
> perf_compiler: gcc
>
>
>
> compiler: gcc-11
> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we also noticed below test failed on this commit but pass on parent:
>
> 392cf49ec54f0c7b ec417ad4c691b5d90ab13cf2678
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 33% 2:2 perf-sanity-tests.Check_branch_stack_sampling.fail
> :6 33% 2:2 perf-sanity-tests.Test_dwarf_unwind.fail
> :6 33% 2:2 perf-sanity-tests.perf_record_tests.fail
>
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Link: https://lore.kernel.org/oe-lkp/202304211253.cbcd33b7-oliver.sang@intel.com
>
>
>
> 2023-04-20 17:08:51 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-func-ec417ad4c691b5d90ab13cf26789e8719468ae39/tools/perf/perf test 76
> 76: Test dwarf unwind : FAILED!
>
> ...
>
> 2023-04-20 17:09:39 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-func-ec417ad4c691b5d90ab13cf26789e8719468ae39/tools/perf/perf test 94
> 94: perf record tests : FAILED!
>
> ...
>
> 2023-04-20 17:16:40 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-func-ec417ad4c691b5d90ab13cf26789e8719468ae39/tools/perf/perf test 110
> 110: Check branch stack sampling : FAILED!
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
The important detail missing here is that these tests are failing with
address sanitizer. Here is the first failure:
==742187==ERROR: AddressSanitizer: stack-buffer-underflow on address
0x7fffe253b430 at pc 0x7f2f2cc4814b bp 0x7fffe253b360 sp
0x7fffe253ab10
READ of size 8192 at 0x7fffe253b430 thread T0
#0 0x7f2f2cc4814a in __interceptor_memcpy
libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
#1 0x561de4706b25 in sample_ustack arch/x86/tests/dwarf-unwind.c:39
#2 0x561de4706cf7 in test__arch_unwind_sample
arch/x86/tests/dwarf-unwind.c:77
#3 0x561de43d8832 in test_dwarf_unwind__thread tests/dwarf-unwind.c:120
#4 0x561de43d8b08 in test_dwarf_unwind__compare tests/dwarf-unwind.c:152
#5 0x7f2f2bc5c47b in __GI_bsearch ../bits/stdlib-bsearch.h:33
#6 0x7f2f2cc4a4ac in __interceptor_bsearch
libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:10155
#7 0x7f2f2cc4a4ac in __interceptor_bsearch
libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:10150
#8 0x561de43d8cc9 in test_dwarf_unwind__krava_3 tests/dwarf-unwind.c:176
#9 0x561de43d8d4f in test_dwarf_unwind__krava_2 tests/dwarf-unwind.c:185
#10 0x561de43d8d92 in test_dwarf_unwind__krava_1 tests/dwarf-unwind.c:194
#11 0x561de43d90b5 in test__dwarf_unwind tests/dwarf-unwind.c:234
#12 0x561de434fe6c in run_test tests/builtin-test.c:238
#13 0x561de4350111 in test_and_print tests/builtin-test.c:267
#14 0x561de4350f40 in __cmd_test tests/builtin-test.c:404
#15 0x561de435240f in cmd_test tests/builtin-test.c:561
#16 0x561de43db29a in run_builtin tools/perf/perf.c:323
#17 0x561de43db80b in handle_internal_command tools/perf/perf.c:377
#18 0x561de43dbbd3 in run_argv tools/perf/perf.c:421
#19 0x561de43dc13b in main tools/perf/perf.c:537
Address 0x7fffe253b430 is located in stack of thread T0 at offset 0 in frame
#0 0x561de43d8716 in test_dwarf_unwind__thread tests/dwarf-unwind.c:113
This frame has 2 object(s):
[32, 40) 'cnt' (line 115) <== Memory access at offset 0 partially
underflows this variable
[64, 1440) 'sample' (line 114) <== Memory access at offset 0
partially underflows this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-underflow
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
in __interceptor_memcpy
Shadow bytes around the buggy address:
0x10007c49f630: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007c49f680: 00 00 00 00 00 00[f1]f1 f1 f1 00 f2 f2 f2 00 00
0x10007c49f690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007c49f6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==742187==ABORTING
Which corresponds to:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/arch/x86/tests/dwarf-unwind.c?h=perf-tools-next#n39
sp = (unsigned long) regs[PERF_REG_X86_SP];
...
memcpy(buf, (void *) sp, stack_size);
ie address sanitizer is failing because of reading non-data values
from the stack, which is an inherent property of the test. So what's
confusing in this report isn't that we see failures, but why the
failures are being reported now. Presumably because code changed and
there is some effort to correlate the two things.
Thanks,
Ian
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests
>
>
Powered by blists - more mailing lists