[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMqrmmDG65BGeZp0@x1>
Date: Wed, 17 Sep 2025 09:37:46 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Gautam Menghani <gautam@...ux.ibm.com>, peterz@...radead.org,
mingo@...hat.com, namhyung@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
adrian.hunter@...el.com, kan.liang@...ux.intel.com,
maddy@...ux.ibm.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] perf python: Add an example for sampling
On Tue, Sep 16, 2025 at 01:07:43PM -0700, Ian Rogers wrote:
> On Tue, Sep 16, 2025 at 12:25 PM Arnaldo Carvalho de Melo
> <acme@...nel.org> wrote:
> >
> > On Tue, Sep 16, 2025 at 07:00:48PM +0530, Gautam Menghani wrote:
> > > Hi Ian/Arnaldo,
> > >
> > > Can you please review this series and let me know if any changes are
> > > needed?
> >
> > Looking at it now, sry for the delay,
>
> I think the patches look good. I'm a little concerned that the python
> APIs are a chance to do something better than the C APIs that have
> evolved. For example, we removed UID out of target recently [1] as the
> BPF alternative was better. Had this patch come earlier then it seems
> likely we'd have had target with UIDs. I wonder rather than having a
> kwlist of:
>
> + static char *kwlist[] = { "target", "inherit_stat", "no_buffering",
> "no_inherit",
> + "no_inherit_set", "no_samples", "raw_samples",
> + "sample_address", "sample_phys_addr", "sample_data_page_size",
> + "sample_code_page_size", "sample_weight", "sample_time",
> + "sample_time_set", "sample_cpu", "sample_identifier",
> + "sample_data_src", "period", "period_set", "running_time",
> + "full_auxtrace", "auxtrace_snapshot_mode",
> + "auxtrace_snapshot_on_exit", "auxtrace_sample_mode",
> + "record_namespaces", "record_cgroup", "record_switch_events",
> + "record_switch_events_set", "all_kernel", "all_user",
> + "kernel_callchains", "user_callchains", "tail_synthesize",
> + "overwrite", "ignore_missing_thread", "strict_freq", "sample_id",
> + "no_bpf_event", "kcore", "text_poke", "build_id", "freq",
> + "mmap_pages", "auxtrace_mmap_pages", "user_freq", "branch_stack",
> + "sample_intr_regs", "sample_user_regs", "default_interval",
> + "user_interval", "auxtrace_snapshot_size", "auxtrace_snapshot_opts",
> + "auxtrace_sample_opts", "sample_transaction", "use_clockid",
> + "clockid", "clockid_res_ns", "nr_cblocks", "affinity", "mmap_flush",
> + "comp_level", "nr_threads_synthesize", "ctl_fd", "ctl_fd_ack",
> + "ctl_fd_close", "synth", "threads_spec", "threads_user_spec",
> + "off_cpu_thresh_ns", NULL };
>
> but then just using this subset:
>
> + opts = perf.record_opts(freq=1000, target=tgt, sample_time=True,
> + sample_cpu=True, no_buffering=True,
> no_inherit=True)
>
> The kwlist should be kept to just those necessary values for the
> example to work? I kind of see this as Arnaldo's baby, so he may just
> want everything, so this needn't be a blocker.
>
> Bigger picture I wonder about migrating the `perf script` code to just
> being regular python programs like the example here.
You mean:
acme@...ber:~/git/perf-tools-next$ ls -la tools/perf/scripts/python/
total 452
drwxr-xr-x. 1 acme acme 902 Aug 20 14:18 .
drwxr-xr-x. 1 acme acme 30 Sep 17 09:27 ..
-rwxr-xr-x. 1 acme acme 11865 Aug 20 14:06 arm-cs-trace-disasm.py
drwxr-xr-x. 1 acme acme 1640 Aug 20 14:18 bin
-rw-r--r--. 1 acme acme 2461 Apr 16 10:06 check-perf-trace.py
-rw-r--r--. 1 acme acme 7923 Apr 16 10:06 compaction-times.py
-rw-r--r--. 1 acme acme 7497 Apr 16 10:06 event_analyzing_sample.py
-rwxr-xr-x. 1 acme acme 157369 Aug 20 14:18 exported-sql-viewer.py
-rw-r--r--. 1 acme acme 39845 Apr 16 10:06 export-to-postgresql.py
-rw-r--r--. 1 acme acme 24671 Apr 16 10:06 export-to-sqlite.py
-rw-r--r--. 1 acme acme 2173 Apr 16 10:06 failed-syscalls-by-pid.py
-rwxr-xr-x. 1 acme acme 10377 Aug 20 14:18 flamegraph.py
-rw-r--r--. 1 acme acme 1717 Apr 16 10:06 futex-contention.py
-rw-r--r--. 1 acme acme 13302 Apr 16 10:06 gecko.py
-rw-r--r--. 1 acme acme 14636 Apr 16 10:06 intel-pt-events.py
-rw-r--r--. 1 acme acme 3395 Apr 16 10:06 libxed.py
-rw-r--r--. 1 acme acme 4230 Aug 20 14:13 mem-phys-addr.py
-rw-r--r--. 1 acme acme 15420 Aug 20 14:05 netdev-times.py
-rwxr-xr-x. 1 acme acme 1833 Apr 16 10:06 net_dropmonitor.py
-rwxr-xr-x. 1 acme acme 30683 Aug 20 14:05 parallel-perf.py
drwxr-xr-x. 1 acme acme 34 Sep 17 09:27 Perf-Trace-Util
-rw-r--r--. 1 acme acme 4509 Apr 16 10:06 powerpc-hcalls.py
-rw-r--r--. 1 acme acme 12095 Apr 16 10:06 sched-migration.py
-rw-r--r--. 1 acme acme 2183 Apr 16 10:06 sctop.py
-rwxr-xr-x. 1 acme acme 4408 Apr 16 10:06 stackcollapse.py
-rw-r--r--. 1 acme acme 2444 Apr 16 10:06 stat-cpi.py
-rw-r--r--. 1 acme acme 2055 Apr 16 10:06 syscall-counts-by-pid.py
-rw-r--r--. 1 acme acme 1673 Apr 16 10:06 syscall-counts.py
-rwxr-xr-x. 1 acme acme 34014 Apr 16 10:06 task-analyzer.py
acme@...ber:~/git/perf-tools-next$
And then make:
acme@...ber:~/git/perf-tools-next$ perf script -l
List of available trace scripts:
compaction-times [-h] [-u] [-p|-pv] [-t | [-m] [-fs] [-ms]] [pid|pid-range|comm-regex] display time taken by mm compaction
event_analyzing_sample analyze all perf samples
export-to-postgresql [database name] [columns] [calls] export perf data to a postgresql database
export-to-sqlite [database name] [columns] [calls] export perf data to a sqlite3 database
failed-syscalls-by-pid [comm] system-wide failed syscalls, by pid
flamegraph create flame graphs
futex-contention futext contention measurement
gecko create firefox gecko profile json format from perf.data
intel-pt-events print Intel PT Events including Power Events and PTWRITE
mem-phys-addr resolve physical address samples
net_dropmonitor display a table of dropped frames
netdev-times [tx] [rx] [dev=] [debug] display a process of packet and processing time
powerpc-hcalls
sched-migration sched migration overview
sctop [comm] [interval] syscall top
stackcollapse produce callgraphs in short form for scripting use
syscall-counts-by-pid [comm] system-wide syscall counts, by pid
syscall-counts [comm] system-wide syscall counts
task-analyzer analyze timings of tasks
failed-syscalls [comm] system-wide failed syscalls
rw-by-file <comm> r/w activity for a program, by file
rw-by-pid system-wide r/w activity
rwtop [interval] system-wide r/w top
wakeup-latency system-wide min/max/avg wakeup latency
acme@...ber:~/git/perf-tools-next$
And make:
perf script rwtop
Just call 'python PATH_TO_PYTHON_SCRIPTS/rwtop.py' transparently?
That looks interesting indeed, that way we would stop linking with
libpython, etc.
I wonder if there are out of tree scripts using the current tools/perf/util/scripting-engines/trace-event-python.c
mechanism...
But even that can fallback to a python based mechanism, right?
Import the script, if it has a given structure, use the new way, if not,
call a glue that reads the events and feed to the old style code.
Seems doable and would save code on the main perf binary and headaches
with the libpython and libperl build processes.
- Arnaldo
> I sent out
> deprecating the libperl code to this ends (looking for reviews):
> https://lore.kernel.org/linux-perf-users/20250908181918.3533480-1-irogers@google.com/
> The issue is that `perf script` being the main thread inhibits things
> like textual running until trace_end. This means we can't do things
> like incremental loading support. We may want to make the perf events
> support something like an asyncio interface for that.
>
> Refactoring that support will likely raise backward compatibility
> concerns. It'd be a really nice thing to do as the API has some fairly
> major overheads like turning everything in a sample into a Dict
> whether needed or not:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/scripting-engines/trace-event-python.c#n838
> I mention this just to say why I'd like to minimize the API when possible.
>
> Thanks,
> Ian
>
> [1] https://lore.kernel.org/r/20250604174545.2853620-10-irogers@google.com
Powered by blists - more mailing lists