linux-kernel - [RFC] perf tool improvement requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABPqkBT0kEMOnTw1-E1kezCvtWNSws7dp2qzCWKVmjGPOEga8Q@mail.gmail.com>
Date:   Mon, 3 Sep 2018 19:45:48 -0700
From:   Stephane Eranian <eranian@...gle.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:     Jiri Olsa <jolsa@...hat.com>, Jiri Olsa <jolsa@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Namhyung Kim <namhyung@...nel.org>
Subject: [RFC] perf tool improvement requests

Hi Arnaldo, Jiri,

A few weeks ago, you had asked if I had more requests for the perf tool.
I have put together the following list to improve the usability of the
perf tool, at
least for our usage. Nothing is very big just small improvements here and there.

1/ perf stat interval printing

    Today, the timestamp printed via perf stat -I is relative to the
start of the measurements. It would be beneficial to also support a
mode where it is using a source which can be synchronized with other
traces  or profiles. For instance, using gettimeofday() or
clocktime(MONOTONIC).

 2/ perf report event grouping

  if you do:
  $ perf record -e '{ cycles, instructions, branches }' ....
  $ perf report
  It will show the 3 profiles together which is VERY useful. However
the output is confusing because it is hard to tell which % corresponds
to which event. I know it is cmdline order. But it would be good to
have a header in the columns to point to the events, instead of
guessing. A few times, I had to revert to perf report --header-only to
figure out the event order. I discovered the 'i' key on the function
profile. But it is still hard to find the events, especially if you
passed many of them.

  3/ annotate output of loops

Percent│401f00:   xor    %eax,%eax
            │401f02:   test   %edi,%edi
            │401f04: ↓ jle    401f2b <triad+0x2b>
            │401f06:   nopw   %cs:0x0(%rax,%rax,1)
  34.20 │401f1┌─→  movsd  (%rcx,%rax,8),%xmm1
  14.60 │401f1│:   mulsd  %xmm0,%xmm1
  33.24 │401f1│:   addsd  (%rdx,%rax,8),%xmm1
    9.98 │401f1│:   movsd  %xmm1,(%rsi,%rax,8)
    0.10 │401f2│:   add    $0x1,%rax
    0.03 │401f2├──  cmp    %eax,%edi
    7.84 │401f2└──↑ jg     401f10 <triad+0x10>
            │401f2b:   mov    $0x18,%eax
            │401f30: ← retq

    The loop arrows cut through the code addresses. That is annoying!

   4/ sorting and event groups

       If I do:
       $  perf record -e '{cycles,instructions}'
       $ perf report
       It will sort the samples based on the first (leader) of the
group. Yet here all events are sampling events. You could as well sort
with the second event. But I don't think perf report support sort
order on multiple events. Both are from the same category: syms (or
ip).

        Right now, I would have to collect another profile:
       $  perf record -e '{instructions,cycles}'
       $ perf report

   5) cgroups

    Today, to measure multiple group events in the same cgroup, you need to do:
     $ perf stat -e cycles,branch,instructions -G foo,foo,foo .....

     You need to specify the cgroup N-times for N-events. It would be
good to support a mode where you'd have to specify the cgroup once:

      $ perf stat -e cycles,branches,instructions --cgroup-all foo,bar

      Would measure cycles,branches,instructions for both cgroup foo and bar.


   6) perf script ip vs. callchain

     I already submitted this request separately. It is about
providing a way to generate the callchain separately from the ip in
perf script. Right now, they are lumped together which is not always
useful. Also right now, the callchain is a multi-line output which is
not useful. perf script should stick with one line per sample, at
least when symbolization is off. We have examples of that with
brstack.

I may have more requests but I wanted to start with these for now.
Thanks for your efforts.