lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+b9B6=ijtGfrtwQFq44ceiRTmv53pC9iqgNsbeg=cFr4Q@mail.gmail.com>
Date: Wed, 5 Feb 2025 17:37:25 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: irogers@...gle.com, acme@...nel.org, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/8] perf report: Add latency and parallelism profiling

On Tue, 4 Feb 2025 at 05:21, Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hello,
>
> On Mon, Feb 03, 2025 at 03:30:35PM +0100, Dmitry Vyukov wrote:
> > There are two notions of time: wall-clock time and CPU time.
> > For a single-threaded program, or a program running on a single-core
> > machine, these notions are the same. However, for a multi-threaded/
> > multi-process program running on a multi-core machine, these notions are
> > significantly different. Each second of wall-clock time we have
> > number-of-cores seconds of CPU time.
> >
> > Currently perf only allows to profile CPU time. Perf (and all other
> > existing profilers to the be best of my knowledge) does not allow to
> > profile wall-clock time.
> >
> > Optimizing CPU overhead is useful to improve 'throughput', while
> > optimizing wall-clock overhead is useful to improve 'latency'.
> > These profiles are complementary and are not interchangeable.
> > Examples of where latency profile is needed:
> >  - optimzing build latency
> >  - optimizing server request latency
> >  - optimizing ML training/inference latency
> >  - optimizing running time of any command line program
> >
> > CPU profile is useless for these use cases at best (if a user understands
> > the difference), or misleading at worst (if a user tries to use a wrong
> > profile for a job).
> >
> > This series add latency and parallelization profiling.
> > See the added documentation and flags descriptions for details.
> >
> > Brief outline of the implementation:
> >  - add context switch collection during record
> >  - calculate number of threads running on CPUs (parallelism level)
> >    during report
> >  - divide each sample weight by the parallelism level
> > This effectively models that we were taking 1 sample per unit of
> > wall-clock time.
> >
> > We still default to the CPU profile, so it's up to users to learn
> > about the second profiling mode and use it when appropriate.
> >
> > Changes in v4:
> >  - added "Shrink struct hist_entry size" commit
> >  - rebased to perf-tools-next HEAD
> >
> > Changes in v3:
> >  - rebase and split into patches
> >  - rename 'wallclock' to 'latency' everywhere
> >  - don't enable latency profiling by default,
> >    instead add record/report --latency flag
> >
> > Dmitry Vyukov (8):
> >   perf report: Add machine parallelism
> >   perf report: Add parallelism sort key
> >   perf report: Switch filtered from u8 to u16
> >   perf report: Add parallelism filter
> >   perf report: Add latency output field
> >   perf report: Add --latency flag
> >   perf report: Add latency and parallelism profiling documentation
> >   perf hist: Shrink struct hist_entry size
>
> I found that stdio output is broken that latency is too high.  TUI is fine.
>
>   $ ./perf report --latency -s dso --stdio
>   ...
>   #
>   #  Latency  Overhead  Shared Object
>   # ........  ........  ....................
>   #
>      57669.15%    57.67%  [kernel.kallsyms]
>      42330.85%    42.33%  ld-linux-x86-64.so.2

Fixed in v5.

> Thanks,
> Namhyung
>
> >
> >  .../callchain-overhead-calculation.txt        |  5 +-
> >  .../cpu-and-latency-overheads.txt             | 85 ++++++++++++++++++
> >  tools/perf/Documentation/perf-report.txt      | 49 ++++++----
> >  tools/perf/Documentation/tips.txt             |  3 +
> >  tools/perf/builtin-record.c                   | 20 +++++
> >  tools/perf/builtin-report.c                   | 39 ++++++++
> >  tools/perf/ui/browsers/hists.c                | 27 +++---
> >  tools/perf/ui/hist.c                          | 64 +++++++++----
> >  tools/perf/util/addr_location.c               |  1 +
> >  tools/perf/util/addr_location.h               |  7 +-
> >  tools/perf/util/event.c                       | 11 +++
> >  tools/perf/util/events_stats.h                |  2 +
> >  tools/perf/util/hist.c                        | 90 +++++++++++++++----
> >  tools/perf/util/hist.h                        | 32 +++++--
> >  tools/perf/util/machine.c                     |  7 ++
> >  tools/perf/util/machine.h                     |  6 ++
> >  tools/perf/util/sample.h                      |  2 +-
> >  tools/perf/util/session.c                     | 12 +++
> >  tools/perf/util/session.h                     |  1 +
> >  tools/perf/util/sort.c                        | 69 ++++++++++++--
> >  tools/perf/util/sort.h                        |  3 +-
> >  tools/perf/util/symbol.c                      | 34 +++++++
> >  tools/perf/util/symbol_conf.h                 |  8 +-
> >  23 files changed, 502 insertions(+), 75 deletions(-)
> >  create mode 100644 tools/perf/Documentation/cpu-and-latency-overheads.txt
> >
> >
> > base-commit: 8ce0d2da14d3fb62844dd0e95982c194326b1a5f
> > --
> > 2.48.1.362.g079036d154-goog
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ