[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z9w_aFlzOkoCTSqK@google.com>
Date: Thu, 20 Mar 2025 09:16:40 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org,
Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH 1/3] perf sort: Keep output fields in the same level
Hi Ingo,
On Thu, Mar 20, 2025 at 10:32:39AM +0100, Ingo Molnar wrote:
>
> * Namhyung Kim <namhyung@...nel.org> wrote:
>
> > On Fri, Mar 07, 2025 at 12:08:27AM -0800, Namhyung Kim wrote:
> > > This is useful for hierarchy output mode where the first level is
> > > considered as output fields. We want them in the same level so that it
> > > can show only the remaining groups in the hierarchy.
> > >
> > > Before:
> > > $ perf report -s overhead,sample,period,comm,dso -H --stdio
> > > ...
> > > # Overhead Samples / Period / Command / Shared Object
> > > # ................. ..........................................
> > > #
> > > 100.00% 4035
> > > 100.00% 3835883066
> > > 100.00% perf
> > > 99.37% perf
> > > 0.50% ld-linux-x86-64.so.2
> > > 0.06% [unknown]
> > > 0.04% libc.so.6
> > > 0.02% libLLVM-16.so.1
> > >
> > > After:
> > > $ perf report -s overhead,sample,period,comm,dso -H --stdio
> > > ...
> > > # Overhead Samples Period Command / Shared Object
> > > # ....................................... .......................
> > > #
> > > 100.00% 4035 3835883066 perf
> > > 99.37% 4005 3811826223 perf
> > > 0.50% 19 19210014 ld-linux-x86-64.so.2
> > > 0.06% 8 2367089 [unknown]
> > > 0.04% 2 1720336 libc.so.6
> > > 0.02% 1 759404 libLLVM-16.so.1
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@...nel.org>
> >
> > Ping! Anybody interested in this change? :)
>
> Oh yes, all such pieces of intelligent organization of textual output
> of profiling data are worth their weight in gold in my book. :-)
>
> Acked-by: Ingo Molnar <mingo@...nel.org>
Thanks a lot!
>
> 1)
>
> On a related note, does anyone know why perf stat output alignment
> sucks so much these days:
>
> starship:~/tip> perf stat --null --repeat 20 perf stat --null --repeat 3 perf bench sched messaging 2>&1 | grep elapsed
> 0.11620 +- 0.00327 seconds time elapsed ( +- 2.81% )
> 0.120813 +- 0.000570 seconds time elapsed ( +- 0.47% )
> 0.122280 +- 0.000443 seconds time elapsed ( +- 0.36% )
> 0.119813 +- 0.000752 seconds time elapsed ( +- 0.63% )
> 0.12190 +- 0.00134 seconds time elapsed ( +- 1.10% )
> 0.119862 +- 0.000542 seconds time elapsed ( +- 0.45% )
> 0.120075 +- 0.000608 seconds time elapsed ( +- 0.51% )
> 0.120350 +- 0.000273 seconds time elapsed ( +- 0.23% )
> 0.12203 +- 0.00114 seconds time elapsed ( +- 0.93% )
> 0.12229 +- 0.00114 seconds time elapsed ( +- 0.93% )
> 0.12032 +- 0.00115 seconds time elapsed ( +- 0.95% )
> 0.121241 +- 0.000463 seconds time elapsed ( +- 0.38% )
> 0.119404 +- 0.000333 seconds time elapsed ( +- 0.28% )
> 0.119945 +- 0.000766 seconds time elapsed ( +- 0.64% )
> 0.121215 +- 0.000879 seconds time elapsed ( +- 0.72% )
> 0.12001 +- 0.00109 seconds time elapsed ( +- 0.91% )
> 0.12193 +- 0.00182 seconds time elapsed ( +- 1.49% )
> 0.119184 +- 0.000794 seconds time elapsed ( +- 0.67% )
> 0.120062 +- 0.000439 seconds time elapsed ( +- 0.37% )
> 0.120834 +- 0.000760 seconds time elapsed ( +- 0.63% )
> 0.369473 +- 0.000992 seconds time elapsed ( +- 0.27% )
>
> ... see how the vertical alignment of the output goes randomly wacko -
> I presume because there's a trailing zero in the output number and the
> code for some inexplicable reason decides to shorten it to make the
> life of developers harder? ;-)
It seems it calcuates the precision when printing the result and doesn't
care much about the repeatations.
>
> 2)
>
> It's also incredibly hard to Ctrl-C a 'perf stat --repeat' instance:
>
> starship:~/tip> perf stat --null --repeat 20 perf stat --null --repeat 3 perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> ...
> Ctrl-C
>
> # Running 'sched/messaging' benchmark:
> perf: pollperf: perf: pollperf: pollpollperf: pollperf: pollperf: : Interrupted system call
> : Interrupted system call
> poll: Interrupted system call
> perf: pollperf: : Interrupted system call
> perf: pollperf: pollpollperf: : Interrupted system call
> pollperf: pollperf: perf: perf: pollpollpollperf: : Interrupted system call
> pollperf: poll: Interrupted system call
> : Interrupted system call
> : Interrupted system call
> : Interrupted system call
> perf: poll: Interrupted system call
> perf: perf: pollpoll: Interrupted system call
> : Interrupted system call
> perf: perf: perf: perf: perf: perf: : Interrupted system call
> pollpollpollpollpollpoll: Interrupted system call
> : Interrupted system call
> : Interrupted system call
> perf: perf: pollperf: perf: perf: perf: perf: perf: pollperf: : Interrupted system call
> pollpollpoll: Interrupted system call
>
> Note how the perf stat instance actually *hangs*. I have to Ctrl-Z it,
> and kill -9 %1 it the hard way to clean up:
>
> pollpollpoll: Interrupted system call
> �
> [1]+ Stopped perf stat --null --repeat 20 perf stat --null --repeat 3 perf bench sched messaging
> starship:~/tip> kill -9 %1
>
> [1]+ Stopped perf stat --null --repeat 20 perf stat --null --repeat 3 perf bench sched messaging
> starship:~/tip> kill -9 %1
>
> Does anyone use this thing for actual benchmarking work? ;-)
Oh, ok. I'll take a look.
>
> 3)
>
> It would also be nice to be able to Ctrl-C out of a 'perf top' instance
> that freezes the output or so. Or prints a snapshot in ASCII. Anything
> but what it does currently: it just exits and clears the xterm screen
> of all useful information...
>
> I have to use 'f' (how many people know about that feature?) and copy &
> paste anything interesting from the screen the hard way.
Actually I was not aware of 'f' key. :) I think you can use 'P' to save
the current screen to a file.
>
> 4)
>
> It would also be nice to have an export function to save current 'perf
> top' profiling data and have it available for 'perf report' et al
> analysis. Ie. frequently I just see an interesting snapshot and decide
> that it's good for further analysis, freeze the screen and are left
> with very few options to keep it for further look and reference.
Hmm.. interesting. I'll think about it but it seems doable.
>
> 5)
>
> Would anyone be interested in an OpenGL-ish version of perf top, with
> its own low level shader for font-atlas based text output and vertex
> based polygon graphics, double buffering, full screen support with very
> little Xorg interaction for the GX pathway, etc? It should be *far*
> faster and lower overhead than the current ncurses->xterm->Wayland
> levels of indirection... and it could open up a new world of on-screen
> profiling information as well. Basically a very simple self-sustained
> OpenGL game engine for the key low level graphics primitives of modern
> GFX hardware. I could whip up a prototype if there's interest.
I'm not familiar with the fancy GFX things but it'd be interesting to
see. Not sure how well we can manage it in the future though, as we
have unloved GTK2 UI as well.
Thanks,
Namhyung
Powered by blists - more mailing lists