linux-kernel - Re: [PATCH v2 4/4] perf python: Add counting.py as example for counting perf events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCOwnUUVKx798Uza@x1>
Date: Tue, 13 May 2025 17:50:37 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Gautam Menghani <gautam@...ux.ibm.com>, namhyung@...nel.org,
	peterz@...radead.org, mingo@...hat.com, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	adrian.hunter@...el.com, kan.liang@...ux.intel.com,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	maddy@...ux.ibm.com
Subject: Re: [PATCH v2 4/4] perf python: Add counting.py as example for
 counting perf events

On Mon, May 12, 2025 at 12:38:23PM -0700, Ian Rogers wrote:
> On Mon, May 12, 2025 at 10:49 AM Arnaldo Carvalho de Melo
> <acme@...nel.org> wrote:
> >
> > On Mon, May 12, 2025 at 10:23:39AM -0700, Ian Rogers wrote:
> > > On Sun, May 11, 2025 at 10:58 PM Gautam Menghani <gautam@...ux.ibm.com> wrote:
> > > > Add counting.py - a python version of counting.c to demonstrate
> > > > measuring and reading of counts for given perf events.
> >
> > > > Signed-off-by: Gautam Menghani <gautam@...ux.ibm.com>
> > > > ---
> > > > v1 -> v2:
> > > > 1. Use existing iteration support instead of next
> > > > 2. Read the counters on all cpus
> > > > 3. Use existing helper functions
> > > >
> > > >  tools/perf/python/counting.py | 34 ++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 34 insertions(+)
> > > >  create mode 100755 tools/perf/python/counting.py
> >
> > > > diff --git a/tools/perf/python/counting.py b/tools/perf/python/counting.py
> > > > new file mode 100755
> > > > index 000000000000..e535e3ae8bdf
> > > > --- /dev/null
> > > > +++ b/tools/perf/python/counting.py
> > > > @@ -0,0 +1,34 @@
> > > > +#!/usr/bin/env python3
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +# -*- python -*-
> > > > +# -*- coding: utf-8 -*-
> > > > +
> > > > +import perf
> > > > +
> > > > +def main():
> > > > +        cpus = perf.cpu_map()
> > > > +        thread_map = perf.thread_map(-1)
> > > > +        evlist = perf.parse_events("cpu-clock,task-clock", cpus, thread_map)
> >
> > > Thanks Gautam! I think this is really good. Perhaps the events could
> > > be a command line option, but I can see why you want to keep this
> > > similar to counting.c.
> >
> > > > +
> > > > +        for ev in evlist:
> > > > +            ev.read_format = perf.FORMAT_TOTAL_TIME_ENABLED | perf.FORMAT_TOTAL_TIME_RUNNING
> > > > +
> > > > +        evlist.open()
> > > > +        evlist.enable()
> > > > +
> > > > +        count = 100000
> > > > +        while count > 0:
> > > > +            count -= 1
> > > > +
> > > > +        evlist.disable()
> > > > +
> > > > +        for evsel in evlist:
> > > > +            for cpu in cpus:
> > > > +                for thread in range(len(thread_map)):
> >
> > > I kind of wish, for the reason of being intention revealing, this could just be:
> >
> > > for thread in thread_map:
> >
> > > I can see the problem though, the counts lack the thread_map and the
> > > thread_map is needed to turn a thread back into an index. Perhaps when
> > > the python counts is created we hold onto the evsel so that this is
> > > possible. I also suspect that in the code:
> >
> > > for cpu in cpus:
> >
> > > The CPU number is being used rather than its index, which is a similar
> > > story/problem.
> >
> > Lemme see the rest of this code...
> >
> > +static PyObject *pyrf_evsel__read(struct pyrf_evsel *pevsel,
> > +                                 PyObject *args, PyObject *kwargs)
> > +{
> > +       struct evsel *evsel = &pevsel->evsel;
> > +       int cpu_map_idx = 0, thread = 0;
> > +       struct perf_counts_values counts;
> > +       struct pyrf_counts_values *count_values = PyObject_New(struct pyrf_counts_values,
> > +                                                              &pyrf_counts_values__type);
> > +
> > +       if (!PyArg_ParseTuple(args, "ii", &cpu_map_idx, &thread))
> > +               return NULL;
> > +
> > +       perf_evsel__read(&(evsel->core), cpu_map_idx, thread, &counts);
> > +       count_values->values = counts;
> > +       return (PyObject *)count_values;
> > +}
> >
> > Yeah, it is expecting the cpu_map_idx but the cpu number is being used,
> > that is a bug.
> >
> > The way perf_evsel__read() is implemented:
> >
> > int perf_evsel__read(struct perf_evsel *evsel, int cpu_map_idx, int thread,
> >                      struct perf_counts_values *count)
> >
> > It expects a cpu_map index, not a cpu and then a thread that in its
> > prototype seems to imply its not an index? But it is an index as it ends
> > up being the 'y' for:
> >
> >   xyarray__entry(_evsel->mmap, _cpu_map_idx, _thread))
> >
> > :-/
> 
> Yeah. In the C code we've pretty much committed to notions of cpu map
> index and CPU. We're more ambiguous with threads, but generally thread
> is actually thread index into the thread map. As you say it is for the
> xyarray so that we can densely pack things by index rather than having
> huge gaps, say between PIDs. For the python we don't have to have a
> 1:1 mapping with the C code, so I was wondering if we could just
> remove the notions of index and have them be implementation details?

Agreed, even in the C case I find it confusing to sometimes deal with
indexes and sometimes with real thread/cpu numbers, but if we try and
at least keep the variables/parameter naming reflecting that, then it
should be bearable.

> This would lead to an unfortunate O(log n) translation between
> thread/CPU and index (perf_cpu_map__idx) at the API boundary in
> python.c.

Maybe with some more thinking we can get something better? But I don't
have the bw now to think about it.
 
> > So probably its best to do it using indexes and when needing to know the
> > pid or cpu number then use some helper to get the entry at the given
> > entry? At least for the perf_evsel__read() API that seems to be the
> > case, right?

> > > Arnaldo, could you give some input on what to do wrt indices, threads
> > > and CPUs at the API level? Perhaps we need a refactor and objects for
> > > perf CPU and perf thread, similar to the use of struct perf_cpu in the
> > > C code. The original API all pre-dates that change. The issue is that
> > > changing the API could break existing scripts and we can only fix
> > > those that ship with perf.
> > So it was more for catching new/dead threads without having to process /proc.

<SNIP>
 
> Yeah. I think the sampling API is okay. The nice thing with Gautum's
> patches is adding support for counters for use-cases like perf stat.

Right, I like the effort he is making into having perf more usable in
python, and I encourage him to think about the issues you raised so that
we can come to some good abstractions.

- Arnaldo