[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029053413.355154-1-irogers@google.com>
Date: Tue, 28 Oct 2025 22:33:58 -0700
From: Ian Rogers <irogers@...gle.com>
To: Suzuki K Poulose <suzuki.poulose@....com>, Mike Leach <mike.leach@...aro.org>,
James Clark <james.clark@...aro.org>, John Garry <john.g.garry@...cle.com>,
Will Deacon <will@...nel.org>, Leo Yan <leo.yan@...ux.dev>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Charlie Jenkins <charlie@...osinc.com>,
Thomas Falcon <thomas.falcon@...el.com>, Yicong Yang <yangyicong@...ilicon.com>,
Thomas Richter <tmricht@...ux.ibm.com>, Athira Rajeev <atrajeev@...ux.ibm.com>,
Howard Chu <howardchu95@...il.com>, Song Liu <song@...nel.org>,
Dapeng Mi <dapeng1.mi@...ux.intel.com>, Levi Yun <yeoreum.yun@....com>,
Zhongqiu Han <quic_zhonhan@...cinc.com>, Blake Jones <blakejones@...gle.com>,
Anubhav Shelat <ashelat@...hat.com>, Chun-Tse Shao <ctshao@...gle.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Jean-Philippe Romain <jean-philippe.romain@...s.st.com>, Gautam Menghani <gautam@...ux.ibm.com>,
Dmitry Vyukov <dvyukov@...gle.com>, Yang Li <yang.lee@...ux.alibaba.com>,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Andi Kleen <ak@...ux.intel.com>, Weilin Wang <weilin.wang@...el.com>
Subject: [RFC PATCH v1 00/15] Addition of session API to python module
The perf script command uses a session with process_events to call
through to the python process_events function. The event is turned
into a python dictionary, whether the entries are used or not, adding
overhead. To avoid the overhead, add a session API abstraction and
pass callbacks that can be used to perform the existing perf script
functions. The implementation is incomplete in this RFC.
In this series the mem-phys-addr.py command is ported from perf script
to using the session API. The performance before and after is:
Before:
```
$ perf mem record -a sleep 1
$ time perf script tools/perf/scripts/python/mem-phys-addr.py
Event: cpu_core/mem-loads-aux/
Memory type count percentage
--------------------------------------- ---------- ----------
0-fff : Reserved 3217 100.0
real 0m3.754s
user 0m0.023s
sys 0m0.018s
```
After:
```
$ PYTHONPATH=/tmp/perf/python time python3 tools/perf/python/mem-phys-addr.py
Event: evsel(cpu_core/mem-loads-aux/)
Memory type count percentage
--------------------------------------- ---------- ----------
0-fff : Reserved 3217 100.0
real 0m0.106s
user 0m0.021s
sys 0m0.020s
```
So a roughly 35x speedup, but it maybe that some of that is one time
start-up overhead of libpython which wouldn't be present for larger
perf.data files.
Before porting all the script commands and adding things like
callchain support to the python module, I wanted to get feedback. One
thing that particularly simplifies the series is adding reference
counts to evsel and evlist to avoid copying/cloning evsels created by
the session API when loading a perf.data file.
The approach of moving away from libpython and scripts was most
recently discussed as a topic in:
https://lore.kernel.org/lkml/CAP-5=fWDqE8SYfOLZkg_0=4Ayx6E7O+h7uUp4NDeCFkiN4b7-w@mail.gmail.com/
When creating the python wrapper some house keeping was done around
includes and perf_data's encapsulation.
The perf script callbacks differ from those in perf_tool, for example,
stat is the perf_tool callback is for a stat event while the scripting
ops combine things and have a stat callback associated with
stat_round. Should the session API match the tool or the script API?
The former feels better for long term, while the latter could simplify
porting perf scripts.
Ian Rogers (15):
perf arch arm: Sort includes and add missed explicit dependencies
perf arch x86: Sort includes and add missed explicit dependencies
perf tests: Sort includes and add missed explicit dependencies
perf script: Sort includes and add missed explicit dependencies
perf util: Sort includes and add missed explicit dependencies
perf python: Add add missed explicit dependencies
perf evsel/evlist: Avoid unnecessary #includes
perf maps: Move getting debug_file to verbose path
perf data: Clean up use_stdio and structures
perf python: Add wrapper for perf_data file abstraction
perf python: Add python session abstraction wrapping perf's session
perf evlist: Add reference count
perf evsel: Add reference count
perf python: Add access to evsel and phys_addr in event
perf mem-phys-addr.py: Port to standalone application from perf script
tools/perf/arch/arm/util/cs-etm.c | 22 +-
tools/perf/arch/x86/tests/hybrid.c | 2 +-
tools/perf/arch/x86/tests/topdown.c | 2 +-
tools/perf/arch/x86/util/intel-bts.c | 14 +-
tools/perf/arch/x86/util/intel-pt.c | 31 +-
tools/perf/arch/x86/util/iostat.c | 2 +-
tools/perf/bench/evlist-open-close.c | 18 +-
tools/perf/builtin-ftrace.c | 8 +-
tools/perf/builtin-inject.c | 7 +-
tools/perf/builtin-kvm.c | 4 +-
tools/perf/builtin-lock.c | 2 +-
tools/perf/builtin-record.c | 14 +-
tools/perf/builtin-script.c | 109 ++--
tools/perf/builtin-stat.c | 8 +-
tools/perf/builtin-top.c | 52 +-
tools/perf/builtin-trace.c | 38 +-
tools/perf/python/mem-phys-addr.py | 117 ++++
tools/perf/tests/backward-ring-buffer.c | 18 +-
tools/perf/tests/code-reading.c | 4 +-
tools/perf/tests/event-times.c | 4 +-
tools/perf/tests/event_update.c | 2 +-
tools/perf/tests/evsel-roundtrip-name.c | 8 +-
tools/perf/tests/evsel-tp-sched.c | 4 +-
tools/perf/tests/expand-cgroup.c | 8 +-
tools/perf/tests/hists_cumulate.c | 2 +-
tools/perf/tests/hists_filter.c | 2 +-
tools/perf/tests/hists_link.c | 2 +-
tools/perf/tests/hists_output.c | 2 +-
tools/perf/tests/hwmon_pmu.c | 14 +-
tools/perf/tests/keep-tracking.c | 2 +-
tools/perf/tests/mmap-basic.c | 31 +-
tools/perf/tests/openat-syscall-all-cpus.c | 6 +-
tools/perf/tests/openat-syscall-tp-fields.c | 18 +-
tools/perf/tests/openat-syscall.c | 6 +-
tools/perf/tests/parse-events.c | 4 +-
tools/perf/tests/parse-metric.c | 4 +-
tools/perf/tests/parse-no-sample-id-all.c | 2 +-
tools/perf/tests/perf-record.c | 18 +-
tools/perf/tests/perf-time-to-tsc.c | 2 +-
tools/perf/tests/pfm.c | 4 +-
tools/perf/tests/pmu-events.c | 6 +-
tools/perf/tests/pmu.c | 2 +-
tools/perf/tests/sw-clock.c | 14 +-
tools/perf/tests/switch-tracking.c | 2 +-
tools/perf/tests/task-exit.c | 14 +-
tools/perf/tests/tool_pmu.c | 2 +-
tools/perf/tests/topology.c | 5 +-
tools/perf/util/bpf_counter_cgroup.c | 2 +-
tools/perf/util/bpf_off_cpu.c | 28 +-
tools/perf/util/bpf_trace_augment.c | 7 +-
tools/perf/util/cgroup.c | 6 +-
tools/perf/util/data-convert-bt.c | 2 +-
tools/perf/util/data.c | 81 ++-
tools/perf/util/data.h | 52 +-
tools/perf/util/evlist.c | 100 ++--
tools/perf/util/evlist.h | 23 +-
tools/perf/util/evsel.c | 103 ++--
tools/perf/util/evsel.h | 30 +-
tools/perf/util/expr.c | 2 +-
tools/perf/util/header.c | 12 +-
tools/perf/util/map.h | 6 +-
tools/perf/util/maps.c | 9 +-
tools/perf/util/metricgroup.c | 6 +-
tools/perf/util/parse-events.c | 4 +-
tools/perf/util/parse-events.y | 2 +-
tools/perf/util/perf_api_probe.c | 19 +-
tools/perf/util/pfm.c | 2 +-
tools/perf/util/print-events.c | 2 +-
tools/perf/util/print_insn.h | 5 +-
tools/perf/util/python.c | 584 +++++++++++++++-----
tools/perf/util/record.c | 2 +-
tools/perf/util/s390-sample-raw.c | 15 +-
tools/perf/util/session.c | 4 +-
tools/perf/util/sideband_evlist.c | 16 +-
tools/perf/util/stat-shadow.c | 1 +
tools/perf/util/stat.c | 15 +-
76 files changed, 1152 insertions(+), 650 deletions(-)
create mode 100644 tools/perf/python/mem-phys-addr.py
--
2.51.1.851.g4ebd6896fd-goog
Powered by blists - more mailing lists