linux-kernel - Re: [PATCH] tools/perf: Add wall-clock and parallelism profiling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+b4WYa9TSqKtDKTJNgXth1U30=KddutfSdp5gmXVOV_jA@mail.gmail.com>
Date: Mon, 13 Jan 2025 13:25:19 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: namhyung@...nel.org, irogers@...gle.com
Cc: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] tools/perf: Add wall-clock and parallelism profiling

On Wed, 8 Jan 2025 at 09:34, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
> On Wed, 8 Jan 2025 at 09:24, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> >
> > There are two notions of time: wall-clock time and CPU time.
> > For a single-threaded program, or a program running on a single-core
> > machine, these notions are the same. However, for a multi-threaded/
> > multi-process program running on a multi-core machine, these notions are
> > significantly different. Each second of wall-clock time we have
> > number-of-cores seconds of CPU time.
> >
> > Currently perf only allows to profile CPU time. Perf (and all other
> > existing profilers to the best of my knowledge) does not allow profile
> > wall-clock time.
> >
> > Optimizing CPU overhead is useful to improve 'throughput', while
> > optimizing wall-clock overhead is useful to improve 'latency'.
> > These profiles are complementary and are not interchangeable.
> > Examples of where wall-clock profile is needed:
> >  - optimzing build latency
> >  - optimizing server request latency
> >  - optimizing ML training/inference latency
> >  - optimizing running time of any command line program
> >
> > CPU profile is useless for these use cases at best (if a user understands
> > the difference), or misleading at worst (if a user tries to use a wrong
> > profile for a job).
> >
> > This patch adds wall-clock and parallelization profiling.
> > See the added documentation and flags descriptions for details.
> >
> > Brief outline of the implementation:
> >  - add context switch collection during record
> >  - calculate number of threads running on CPUs (parallelism level)
> >    during report
> >  - divide each sample weight by the parallelism level
> > This effectively models that we were taking 1 sample per unit of
> > wall-clock time.
> >
> > The feature is added on an equal footing with the existing CPU profiling
> > rather than a separate mode enabled with special flags. The reasoning is
> > that users may not understand the problem and the meaning of numbers they
> > are seeing in the first place, so won't even realize that they may need
> > to be looking for some different profiling mode. When they are presented
> > with 2 sets of different numbers, they should start asking questions.
>
> Hi folks,
>
> Am I missing something and this is possible/known already?
>
> I understand this is a large change, and I am open to comments.
> I've also uploaded it to gerrit if you prefer to review there:
> https://linux-review.git.corp.google.com/c/linux/kernel/git/torvalds/linux/+/25608
>
> You may also checkout that branch and try it locally. It works on older kernels.
>
> What of this is testable within the current testing framework?
> Also how do I run tests? I failed to figure it out.
>
> Btw, the profile example in the docs is from a real kernel build on my machine.
> You can see how misleading the current profile is wrt latency.
>
> Or you can see what takes time in the perf make itself.
> (despite -j128, 73% of time was spent with 1 running thread,
> only a few percent of time was spent with high parallelism).
>
>   Wallclock  Overhead           Parallelism / Command
> -    73.64%     6.96%           1
>    +    28.53%     2.70%           cc1
>    +    17.93%     1.69%           python3
>    +    10.79%     1.02%           ld
> -     7.49%     1.42%           2
>    +     4.26%     0.81%           cc1
>    +     0.72%     0.14%           ld
>    +     0.68%     0.13%           cc1plus
> ...
> -     1.33%    15.74%           125
>    +     1.23%    14.50%           cc1
>    +     0.03%     0.33%           gcc
>    +     0.03%     0.32%           sh


> [PATCH] tools/perf: Add wall-clock and parallelism profiling

Note to myself: need to change the subject to "perf report:".