[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWHivx7Q6okMwYOs=65MZr40RjE16tgatMX0hjkCfrwfw@mail.gmail.com>
Date: Tue, 14 Jan 2025 21:59:20 -0800
From: Ian Rogers <irogers@...gle.com>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Namhyung Kim <namhyung@...nel.org>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, eranian@...gle.com,
Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
chu howard <howardchu95@...il.com>
Subject: Re: [PATCH v2] perf report: Add wall-clock and parallelism profiling
On Tue, Jan 14, 2025 at 12:27 AM Dmitry Vyukov <dvyukov@...gle.com> wrote:
[snip]
> FWIW I've also considered and started implementing a different
> approach where the kernel would count parallelism level for each
> context and write it out with samples:
> https://github.com/dvyukov/linux/commit/56ee1f638ac1597a800a30025f711ab496c1a9f2
> Then sched_in/out would need to do atomic inc/dec on a global counter.
> Not sure how hard it is to make all corner cases work there, I dropped
> it half way b/c the perf record post-processing looked like a better
> approach.
Nice. Just to focus on this point and go off on something of a
tangent. I worry a little about perf_event_sample_format where we've
used 25 out of the 64 bits of sample_type. Perhaps there will be a
sample_type2 in the future. For the code and data page size it seems
the same information could come from mmap events. You have a similar
issue. I was thinking of another similar issue, adding information
about the number of dirty pages in a VMA. I wonder if there is a
better way to organize these things, rather than just keep using up
bits in the perf_event_sample_format. For example, we could have a
code page size software event that when in a leader sampling group
with a hardware event with a sample IP provides the code page size
information of the leader event's sample IP. We have loads of space in
the types and config values to have an endless number of such events
and maybe the value could be generated by a BPF program for yet more
flexibility. What these events would mean without a leader sample
event I'm not sure.
Wrt wall clock time, Howard Chu has done some work advancing off-CPU
sampling. Wall clock time being off CPU plus on CPU. We need to do
something to move forward the default flags/options for perf record,
for example, we don't enable build ID mmap events by default causing
the whole perf.data file to be scanned looking to add build ID events
for the dsos with samples in them. One option that could be a default
could be off-CPU profiling, and when permissions deny the BPF approach
we can fallback on using events. If these events are there by default
then it makes sense to hook them up in perf report.
Wrt perf report, I keep trying to push the python support in perf
forward. These unmerged changes show an event being parsed, and ring
buffer based sampling in a reasonably small number of lines of code in
a way not dissimilar to a perf command line:
https://lore.kernel.org/lkml/20250109075108.7651-12-irogers@google.com/
Building a better UI on top of this in python means there are some
reasonable frameworks that can be leveraged, I particularly like the
look of textual:
https://github.com/textualize/textual-demo
which imo would move things a lot further forward than UI stuff in C
and slang/stdio.
Sorry for all this tangential stuff, I like the work and will try to
delve into specifics later.
Thanks,
Ian
Powered by blists - more mailing lists