linux-kernel - Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+YvWYFBkZ9jQ2kuTOHb6pZQwWXc9sOJ5Km0Wr1fLi-94A@mail.gmail.com>
Date: Mon, 5 May 2025 10:03:05 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Ingo Molnar <mingo@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>, Ian Rogers <irogers@...gle.com>, 
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Peter Zijlstra <peterz@...radead.org>, 
	LKML <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org, 
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

On Sun, 4 May 2025 at 21:52, Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hi Ingo,
>
> Thanks for sharing your opinion.
>
> On Sun, May 04, 2025 at 10:22:26AM +0200, Ingo Molnar wrote:
> >
> > * Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > > When it profile a target process (and its children), it's
> > > straight-forward to track parallelism using sched-switch info.  The
> > > parallelism is kept in machine-level in this case.
> > >
> > > But when it profile multiple processes like in the system-wide mode,
> > > it might not be clear how to apply the (machine-level) parallelism to
> > > different tasks.  That's why it disabled the latency profiling for
> > > system-wide mode.
> > >
> > > But it should be able to track parallelism in each process and it'd
> > > useful to profile latency issues in multi-threaded programs.  So this
> > > patch tries to enable it.
> > >
> > > However using sched-switch info can be a problem since it may emit a lot
> > > more data and more chances for losing data when perf cannot keep up with
> > > it.
> > >
> > > Instead, it can maintain the current process for each CPU when it sees
> > > samples.  And the process updates parallelism so that it can calculate
> > > the latency based on the value.  One more point to improve is to remove
> > > the idle task from latency calculation.
> > >
> > > Here's an example:
> > >
> > >   # perf record -a -- perf bench sched messaging
> > >
> > > This basically forks each sender and receiver tasks for the run.
> > >
> > >   # perf report --latency -s comm --stdio
> > >   ...
> > >   #
> > >   #  Latency  Overhead  Command
> > >   # ........  ........  ...............
> > >   #
> > >       98.14%    95.97%  sched-messaging
> > >        0.78%     0.93%  gnome-shell
> > >        0.36%     0.34%  ptyxis
> > >        0.23%     0.23%  kworker/u112:0-
> > >        0.23%     0.44%  perf
> > >        0.08%     0.10%  KMS thread
> > >        0.05%     0.05%  rcu_preempt
> > >        0.05%     0.05%  kworker/u113:2-
> > >        ...
> >
> > Just a generic user-interface comment: I had to look up what 'latency'
> > means in this context, and went about 3 hops deep into various pieces
> > of description until I found Documentation/cpu-and-latency-overheads.txt,
> > where after a bit of head-scratching I realized that 'latency' is a
> > weird alias for 'wall-clock time'...
> >
> > This is *highly* confusing terminology IMHO.
>
> Sorry for the confusion.  I know I'm terrible at naming things. :)
>
> Actually Dmitry used the term 'wall-clock' profiling at first when he
> implemented this feature but I thought it was not clear how it meant
> for non-cycle events.  As 'overhead' is also a generic term, we ended
> up with 'latency'.

Exactly :)

I've also talked with a bunch of people about this, and everybody
proposes their own term and is confused by all other proposals.

The problem is that we did not just lack this fundamental profiling
capability in all profilers out there, but we, as a community, also
still don't know how to even talk about these things...

There is no terminology that would be clear for everybody. E.g. when
some people hear wall-clock, they imply that it samples every thread
(runnable and non-runnnable) every time unit (but that's a vastly
different profile from this one).


> > 'Latency' is a highly overloaded concept that almost never corresponds
> > to 'wall clock time'. It usually means a relative delay value, which is
> > why I initially thought this somehow means instruction-latency or
> > memory-latency profiling ...
> >
> > Ie. 'latency' in its naive meaning, is on the exact opposite side of
> > the terminology spectrum of where it should be: it suggests relative
> > time, while in reality it's connected to wall-clock/absolute time ...
> >
> > *Please* use something else. Wall-clock is fine, as
> > cpu-and-latency-overheads.txt uses initially, but so would be other
> > combinations:
> >
> >    #1: 'CPU time' vs. 'real time'
> >
> >         This is short, although a disadvantage is the possible
> >         'real-time kernel' source of confusion here.
> >
> >    #2: 'CPU time' vs. 'wall-clock time'
> >
> >         This is longer but OK and unambiguous.
> >
> >    #3: 'relative time' vs. 'absolute time'
> >
> >         This is short and straightforward, and might be my favorite
> >         personally, because relative/absolute is such an unambiguous
> >         and well-known terminology and often paired in a similar
> >         fashion.
> >
> >    #4: 'CPU time' vs. 'absolute time'
> >
> >         This is a combination of #1 and #3 that keeps the 'CPU time'
> >         terminology for relative time. The CPU/absolute pairing is not
> >         that intuitive though.
> >
> >    #5: 'CPU time' vs. 'latency'
> >
> >         This is really, really bad and unintuitive. Sorry to be so
> >         harsh and negative about this choice, but this is such a nice
> >         feature, which suffers from confusing naming. :-)
>
> Thanks for your seggestions.  My main concern is that it's not just
> about cpu-time and wallclock-time.  perf tools can measure any events
> that have different meanings.  So I think we need generic terms to cover
> them.

I did not see a conflict there. It's possible to sample, say, cache
misses per unit of CPU time or per unit of wall-clock time.