linux-kernel - Re: [PATCH v2 11/11] perf python tracepoint: Switch to using parse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z9CFwbXSGnGLn0lA@x1>
Date: Tue, 11 Mar 2025 15:49:37 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
	Namhyung Kim <namhyung@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	Yicong Yang <yangyicong@...ilicon.com>,
	James Clark <james.clark@...aro.org>,
	"Dr. David Alan Gilbert" <linux@...blig.org>,
	Levi Yun <yeoreum.yun@....com>, Ze Gao <zegao2021@...il.com>,
	Weilin Wang <weilin.wang@...el.com>, Xu Yang <xu.yang_2@....com>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	Howard Chu <howardchu95@...il.com>
Subject: Re: [PATCH v2 11/11] perf python tracepoint: Switch to using
 parse_events

On Tue, Mar 11, 2025 at 02:06:45PM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Mar 11, 2025 at 12:32:05PM -0300, Arnaldo Carvalho de Melo wrote:
> >                 if (field->flags & TEP_FIELD_IS_STRING &&
> >                     is_printable_array(data + offset, len)) {
> >                         ret = PyUnicode_FromString((char *)data + offset);
> >                 } else {
> >                         ret = PyByteArray_FromStringAndSize((const char *) data + offset, len);
> >                         field->flags &= ~TEP_FIELD_IS_STRING;
> >                 }
> > 
> > 
> > For some reason in sessions where bytearrays are returned, and all comes
> > as bytearrays, the problem is not noticed.
> > 
> > But when it comes as a string it breaks after a short time, /me
> > scratches head...
> 
> I think I'm getting closer, with some debugging sprinkled in the python
> binding:
> 
> ip 0 pid=74131 tid=74131 cpu=3
>  ( field 'prev_comm' ret=0x7f0a66a1b970 )  ( field 'prev_pid' ret=0x7f0a66b1bed0 )  ( field 'prev_prio' ret=0x7f0a74f60d08 )  ( field 'prev_state' ret=0x7f0a74f60e08 ) time 4503271651784 prev_comm=kworker/u112:14 prev_pid=74131 prev_prio=120 prev_state=0x80 ==>
>  ( field 'next_comm' ret=0x7f0a66a1b970 )  ( field 'next_pid' ret=0x7f0a74f5fe08 )  ( field 'next_prio' ret=0x7f0a74f60d08 ) next_comm=swapper/3 next_pid=0 next_prio=120
> ip 0 pid=80209 tid=80209 cpu=4
>  ( field 'prev_comm' ret=0x7f0a66a1b970 )  ( field 'prev_pid' ret=0x7f0a66b1bed0 )  ( field 'prev_prio' ret=0x7f0a74f60d08 )  ( field 'prev_state' ret=0x7f0a74f60e08 ) time 4503280531143 prev_comm=kworker/u112:3 prev_pid=80209 prev_prio=120 prev_state=0x80 ==>
>  ( field 'next_comm' ret=0x7f0a66a1b970 )  ( field 'next_pid' ret=0x7f0a74f5fe08 )  ( field 'next_prio' ret=0x7f0a74f60d08 ) next_comm=swapper/4 next_pid=0 next_prio=120
> ip 0 pid=74131 tid=74131 cpu=5
>  ( XXX '�!' len=16)  ( field 'prev_comm' ret=(nil) ) Traceback (most recent call last):
>   File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 51, in <module>
>     main()
>   File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 40, in main
>     event.prev_comm,
>     ^^^^^^^^^^^^^^^
> AttributeError: 'perf.sample_event' object has no attribute 'prev_comm'
> 
> 
> So the size of the field, in the tracepoint format file is 16, its a
> comm:
> 
> root@...ber:/home/acme/git/perf-tools-next# grep '_comm\[' /sys/kernel/tracing/events/sched/sched_switch/format 
> 	field:char prev_comm[16];	offset:8;	size:16;	signed:0;
> 	field:char next_comm[16];	offset:40;	size:16;	signed:0;
> root@...ber:/home/acme/git/perf-tools-next#
> 
> But:
> 
> root@...ber:/home/acme/git/perf-tools-next# cat /proc/74131/comm
> kworker/u112:14-events_unbound
> root@...ber:/home/acme/git/perf-tools-next#
> 
> root@...ber:/home/acme/git/perf-tools-next# echo -n "kworker/u112:14-events_unbound" | wc -c
> 30
> root@...ber:/home/acme/git/perf-tools-next#
> 
> Which should explain that:
> 
> ( XXX '�!' len=16)  ( field 'prev_comm' ret=(nil) )
> 
> That is printed by:
> 
> +++ b/tools/perf/util/python.c
> @@ -318,16 +318,17 @@ tracepoint_field(const struct pyrf_event *pe, struct tep_format_field *field)
>                         if (tep_field_is_relative(field->flags))
>                                 offset += field->offset + field->size;
>                 }
> -               if (field->flags & TEP_FIELD_IS_STRING &&
> -                   is_printable_array(data + offset, len)) {
> +               if (field->flags & TEP_FIELD_IS_STRING) {
>                         ret = PyUnicode_FromString((char *)data + offset);
> +                       if (ret == NULL) {
> +                               printf(" ( XXX '%.*s' len=%d) ", len, (char *)data + offset, len); fflush(stdout);
> +                       }
>                 } else {
>                         ret = PyByteArray_FromStringAndSize((const char *) data + offset, len);
> -                       field->flags &= ~TEP_FIELD_IS_STRING;
>                 }
> 
> So now trying to figure out why when the comm lenght is more than 16 the
> raw_data appears to contain trash...

So it seems to be something just in the python binding, as perf trace
seems to handle it well:

 ( field 'prev_comm' ret=0x7f7c31f65110, raw_size=68 )  ( field 'prev_pid' ret=0x7f7c23b1bed0, raw_size=68 )  ( field 'prev_prio' ret=0x7f7c239c0030, raw_size=68 )  ( field 'prev_state' ret=0x7f7c239c0250, raw_size=68 ) time 14771421785867 prev_comm= prev_pid=1919907691 prev_prio=796026219 prev_state=0x303a32313175 ==>
 ( XXX '��' len=16, raw_size=68)  ( field 'next_comm' ret=(nil), raw_size=68 ) Traceback (most recent call last):
  File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 51, in <module>
    main()
  File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 46, in main
    event.next_comm,
    ^^^^^^^^^^^^^^^
AttributeError: 'perf.sample_event' object has no attribute 'next_comm'
root@...ber:/home/acme/git/perf-tools-next# cat /proc/125355/comm
kworker/u112:0-i915
root@...ber:/home/acme/git/perf-tools-next# 
root@...ber:/home/acme/git/perf-tools-next# 
root@...ber:/home/acme/git/perf-tools-next# perf trace -e sched:sched_switch -p 125355
     0.000 sched:sched_switch(prev_comm: "kworker/u112:0", prev_pid: 125355 (kworker/u112:0-), prev_prio: 120, prev_state: 128, next_comm: "swapper/6", next_prio: 120)
^Croot@...ber:/home/acme/git/perf-tools-next#

I.e. it chops up the prev_comm size to what is specified in the
tracepoint format.

And that sample->raw_size is the same accross the sched:sched_switch
raw_datas (seemingly suboptimal, most are less than 16 bytes, but
probably its not guaranteed that the \0 will be there, so copy all the
16 bytes).

Now to try to figure out why simply using PyUnicode_FromStringAndSize
doesn't work...

- Arnaldo