linux-kernel - Re: 'perf stat --null sleep 1' segfault related to evlist__for_each

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUW0O2mVopONcTjwYLB6JsLY8HgGzggj=zw7R45wBhozA@mail.gmail.com>
Date:   Wed, 16 Mar 2022 00:14:12 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Namhyung Kim <namhyung@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: 'perf stat --null sleep 1' segfault related to
 evlist__for_each_entry with affinity == NULL

On Tue, Mar 15, 2022 at 5:50 AM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> This bounced yesterday, resending :-\
>
> -----
>
> Hi Ian,
>
>         I just noticed this:
>
> (gdb) run stat --null sleep 1
> `/var/home/acme/bin/perf' has changed; re-reading symbols.
> Starting program: /var/home/acme/bin/perf stat --null sleep 1
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [Detaching after fork from child process 900002]
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000596790 in create_perf_stat_counter (evsel=0xda9580, config=0xcbeac0 <stat_config>, target=0xcbe8a0 <target>, cpu_map_idx=0) at util/stat.c:548
> 548             if (leader->core.nr_members > 1)
> Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-6.fc34.x86_64 cyrus-sasl-lib-2.1.27-8.fc34.x86_64 elfutils-debuginfod-client-0.186-1.fc34.x86_64 elfutils-libelf-0.186-1.fc34.x86_64 elfutils-libs-0.186-1.fc34.x86_64 glib2-2.68.4-1.fc34.x86_64 glibc-2.33-21.fc34.x86_64 keyutils-libs-1.6.1-2.fc34.x86_64 libbabeltrace-1.5.8-6.fc34.x86_64 libbrotli-1.0.9-4.fc34.x86_64 libcap-2.48-2.fc34.x86_64 libcom_err-1.45.6-5.fc34.x86_64 libcurl-7.76.1-12.fc34.x86_64 libgcc-11.2.1-9.fc34.x86_64 libidn2-2.3.2-1.fc34.x86_64 libnghttp2-1.43.0-2.fc34.x86_64 libpsl-0.21.1-3.fc34.x86_64 libssh-0.9.6-1.fc34.x86_64 libstdc++-11.2.1-9.fc34.x86_64 libunistring-0.9.10-10.fc34.x86_64 libunwind-1.4.0-5.fc34.x86_64 libuuid-2.36.2-1.fc34.x86_64 libxcrypt-4.4.28-1.fc34.x86_64 libzstd-1.5.2-1.fc34.x86_64 numactl-libs-2.0.14-3.fc34.x86_64 opencsd-1.1.1-1.fc34.x86_64 openldap-2.4.57-6.fc34.x86_64 openssl-libs-1.1.1l-2.fc34.x86_64 pcre-8.44-3.fc34.1.x86_64 perl-libs-5.32.1-477.fc34.x86_64 popt-1.18-4.fc34.x86_64 python3-libs-3.9.10-1.fc34.x86_64 slang-2.3.2-9.fc34.x86_64 xz-libs-5.2.5-5.fc34.x86_64
> (gdb) bt
> #0  0x0000000000596790 in create_perf_stat_counter (evsel=0xda9580, config=0xcbeac0 <stat_config>, target=0xcbe8a0 <target>, cpu_map_idx=0) at util/stat.c:548
> #1  0x0000000000438c2c in __run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:835
> #2  0x00000000004395d5 in run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:1047
> #3  0x000000000043ce17 in cmd_stat (argc=2, argv=0x7fffffffe1f0) at builtin-stat.c:2561
> #4  0x00000000004dc2d1 in run_builtin (p=0xcd5760 <commands+288>, argc=4, argv=0x7fffffffe1f0) at perf.c:313
> #5  0x00000000004dc538 in handle_internal_command (argc=4, argv=0x7fffffffe1f0) at perf.c:365
> #6  0x00000000004dc684 in run_argv (argcp=0x7fffffffe03c, argv=0x7fffffffe030) at perf.c:409
> #7  0x00000000004dca50 in main (argc=4, argv=0x7fffffffe1f0) at perf.c:539
> (gdb) p leader
> $1 = (struct evsel *) 0x0
> (gdb) fr 1
> #1  0x0000000000438c2c in __run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:835
> 835                     if (create_perf_stat_counter(counter, &stat_config, &target,
> (gdb) p evsel_list
> $2 = (struct evlist *) 0xda9580
> (gdb) p evsel_list->core.nr_entries
> $3 = 0
> (gdb)
>
> I've bisected it to a patch I added that avoids setting up affinity when
> there is no need to do so, like when running 'perf stat sleep 1', this
> one:
>
> commit 49de179577e7b05b57f625bf05cdc60a72de38d0
> Author: Arnaldo Carvalho de Melo <acme@...hat.com>
> Date:   Mon Jan 17 13:09:29 2022 -0300
>
>     perf stat: No need to setup affinities when starting a workload
>
>     I.e. the simple:
>
>       $ perf stat sleep 1
>
>     Uses a dummy CPU map and thus there is no need to setup/cleanup
>     affinities to avoid IPIs, etc.
>
>     With this we're down to a sched_getaffinity() call, in the libnuma
>     initialization, that probably can be removed in a followup patch.
>
>     Acked-by: Ian Rogers <irogers@...gle.com>
>
> The problem is that when using --null we end up with an empty evlist
> and:
>
>   evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity)
>
> in __run_perf_stat()
>
> gets called with affinity == NULL.
>
> Looking at its code it checks if affinity is NULL, etc, so I thought it
> would be ok, but it ends up passing, IIRC, the empty list head:
>
> (gdb) p evsel->core.node
> $4 = {next = 0xda9580, prev = 0xda9580}
> (gdb) p evsel
> $5 = (struct evsel *) 0xda9580
> (gdb) fr 1
> #1  0x0000000000438c2c in __run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:835
> 835                     if (create_perf_stat_counter(counter, &stat_config, &target,
> (gdb) p evsel_list->core.entries
> $6 = {next = 0xda9580, prev = 0xda9580}
> (gdb)
>
> I.e. it gets confused with its end condition when
> evlist->core.nr_entries is zero.
>
> Can you take a look at improving this? I'm going AFK now, so just wanted
> to dump what I found so far.
>
> I guess the problem got introduced in:
>
> commit 472832d2c000b9611feaea66fe521055c3dbf17a
> Author: Ian Rogers <irogers@...gle.com>
> Date:   Tue Jan 4 22:13:37 2022 -0800
>
>     perf evlist: Refactor evlist__for_each_cpu()
>
>     Previously evlist__for_each_cpu() needed to iterate over the evlist in
>     an inner loop and call "skip" routines. Refactor this so that the
>     iteratr is smarter and the next function can update both the current CPU
>     and evsel.
>
>     By using a cpu map index, fix apparent off-by-1 in __run_perf_stat's
>     call to perf_evsel__close_cpu().
>
> -----------------
>
> And was dormant as the affinity was being always setup, which we don't
> want when it isn't needed.

Thanks for the detailed report, I sent a fix in:
https://lore.kernel.org/lkml/20220316071049.2368250-1-irogers@google.com/
Any thoughts for a shell test to avoid this regressing again in the future?

Thanks,
Ian

> I'm trying to use:
>
>         perf trace --summary perf stat --null sleep 1
>
> to see what is that perf is unconditionally doing that should be
> deferred till it is really needed, for now I'll continue with:
>
> [root@...e ~]# perf trace -s perf stat -e dummy sleep 1
>
>  Performance counter stats for 'sleep 1':
>
>                  0      dummy
>
>        1.000945692 seconds time elapsed
>
>        0.000000000 seconds user
>        0.000861000 seconds sys
>
>
>
>  Summary of events:
>
>  perf (900261), 3127 events, 94.7%
>
>    syscall            calls  errors  total       min       avg       max       stddev
>                                      (msec)    (msec)    (msec)    (msec)        (%)
>    --------------- --------  ------ -------- --------- --------- ---------     ------
>    wait4                  1      0  1000.903  1000.903  1000.903  1000.903      0.00%
>    openat               364    187     1.144     0.002     0.003     0.020      1.90%
>    mmap                 263      0     0.796     0.001     0.003     0.015      2.60%
>    clone                  1      0     0.620     0.620     0.620     0.620      0.00%
>    read                 263      0     0.441     0.001     0.002     0.094     21.06%
>    access                80     49     0.277     0.002     0.003     0.006      2.75%
>    mprotect              80      0     0.269     0.002     0.003     0.006      3.15%
>    newfstatat           174      1     0.210     0.001     0.001     0.007      3.57%
>    close                183      0     0.171     0.001     0.001     0.003      1.64%
>    getdents64            35      0     0.065     0.001     0.002     0.012     17.24%
>    write                 18      0     0.052     0.001     0.003     0.009     15.24%
>    brk                   12      0     0.034     0.001     0.003     0.005     11.27%
>    ioctl                 29     29     0.029     0.001     0.001     0.003      8.38%
>    futex                 20      0     0.026     0.001     0.001     0.002      8.30%
>    perf_event_open        1      0     0.016     0.016     0.016     0.016      0.00%
>    statfs                 3      0     0.011     0.002     0.004     0.006     38.92%
>    rt_sigaction           8      0     0.009     0.001     0.001     0.002     12.76%
>    pread64                6      0     0.006     0.001     0.001     0.001      1.44%
>    pipe                   2      0     0.006     0.002     0.003     0.004     32.45%
>    rt_sigprocmask         4      0     0.006     0.001     0.001     0.002     22.35%
>    prctl                  6      4     0.005     0.001     0.001     0.001      6.86%
>    munmap                 1      0     0.004     0.004     0.004     0.004      0.00%
>    kill                   1      0     0.003     0.003     0.003     0.003      0.00%
>    arch_prctl             2      1     0.002     0.001     0.001     0.001     11.96%
>    sysinfo                1      0     0.002     0.002     0.002     0.002      0.00%
>    sched_getaffinity      1      0     0.001     0.001     0.001     0.001      0.00%
>    getpid                 1      0     0.001     0.001     0.001     0.001      0.00%
>    fcntl                  1      0     0.001     0.001     0.001     0.001      0.00%
>    set_tid_address        1      0     0.001     0.001     0.001     0.001      0.00%
>    prlimit64              1      0     0.001     0.001     0.001     0.001      0.00%
>    set_robust_list        1      0     0.001     0.001     0.001     0.001      0.00%
>    execve                 1      0     0.000     0.000     0.000     0.000      0.00%
>
>
>  sleep (900262), 107 events, 3.2%
>
>    syscall            calls  errors  total       min       avg       max       stddev
>                                      (msec)    (msec)    (msec)    (msec)        (%)
>    --------------- --------  ------ -------- --------- --------- ---------     ------
>    clock_nanosleep        1      0  1000.050  1000.050  1000.050  1000.050      0.00%
>    execve                 8      7     0.282     0.008     0.035     0.198     66.26%
>    read                   2      0     0.058     0.002     0.029     0.056     94.56%
>    mmap                   9      0     0.027     0.002     0.003     0.005      9.92%
>    mprotect               4      0     0.015     0.003     0.004     0.005     10.85%
>    close                  8      0     0.013     0.001     0.002     0.003     13.67%
>    prctl                  1      0     0.012     0.012     0.012     0.012      0.00%
>    openat                 3      0     0.010     0.003     0.003     0.004      4.99%
>    arch_prctl             2      1     0.006     0.001     0.003     0.005     68.42%
>    brk                    3      0     0.006     0.001     0.002     0.002     21.26%
>    access                 1      1     0.005     0.005     0.005     0.005      0.00%
>    munmap                 1      0     0.005     0.005     0.005     0.005      0.00%
>    newfstatat             3      0     0.005     0.001     0.002     0.002      8.55%
>    pread64                4      0     0.005     0.001     0.001     0.001      5.39%
>    rt_sigaction           1      0     0.002     0.002     0.002     0.002      0.00%
>    fcntl                  1      0     0.002     0.002     0.002     0.002      0.00%
>    set_robust_list        1      0     0.002     0.002     0.002     0.002      0.00%
>    clone                  1      0     0.000     0.000     0.000     0.000      0.00%
>
>
> [root@...e ~]#
>
> - Arnaldo