linux-kernel - BUG: perf script python SIGSEGV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <54196e8caa4ac8b4a40f89799656ad8defa02f6b.camel@nokia.com>
Date:   Wed, 5 Feb 2020 11:18:34 +0000
From:   "Huttunen, Janne (Nokia - FI/Espoo)" <janne.huttunen@...ia.com>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: BUG: perf script python SIGSEGV


It appears that the perf command crashes when processing
recording with a Python script and the first recorded event
in happens to be "context_tracking:user_enter" or
"context_tracking:user_exit".


$ perf script -g Python
generated Python script: perf-script.py
$ perf script -s perf-script.py -i perf.data
in trace_begin
No kallsyms or vmlinux with build-id 0cd1e61e3703521ee4de14a1f13aac17b762fe57 was found
Segmentation fault (core dumped)


#0  0x00007ffff58f46fa in strlen () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ffff5535360 in PyString_FromString ()
   from /lib64/libpython2.7.so.1.0
No symbol table info available.
#2  0x000000000059d52f in define_value (
    field_type=field_type@...ry=TEP_PRINT_FLAGS, 
    ev_name=ev_name@...ry=0xd73320 <handler_name> "context_tracking__user_enter", field_name=0x0, field_value=field_value@...ry=0x6ed556 "0", 
    field_str=0xd88ac0 "") at util/scripting-engines/trace-event-python.c:221
        handler_name = <optimized out>
        t = 0x7ffff7e849f0
        value = 0
        n = 1
#3  0x000000000059e5bd in define_event_symbols (event=event@...ry=0xd7f890, 
    ev_name=ev_name@...ry=0xd73320 <handler_name> "context_tracking__user_enter", args=<optimized out>) at util/scripting-engines/trace-event-python.c:282
No locals.
#4  0x000000000059f741 in python_process_tracepoint (al=0x7fffffffba20, 
    evsel=0xd88230, sample=0x7fffffffbce0)
    at util/scripting-engines/trace-event-python.c:821
        comm = 0x1040440 ":3682671"
        handler_name = "context_tracking__user_enter", '\000' <repeats 227 times>
        event = 0xd7f890
        handler = <optimized out>
        field = <optimized out>
        ns = <optimized out>
        n = 0
        context = <optimized out>
        obj = 0x0
        callchain = <optimized out>
        dict = 0x0
        s = <optimized out>
        pid = 3682671
        data = 0x7fffed1442dc
        default_handler_name = 0x6f5332 "trace_unhandled"
        t = 0x4c6b8b <machine__resolve+395>
        all_entries_dict = 0x0
        cpu = 16
        nsecs = 589187356866468
#5  python_process_event (event=<optimized out>, sample=0x7fffffffbce0, 
    evsel=0xd88230, al=0x7fffffffba20)
    at util/scripting-engines/trace-event-python.c:1321
        tables = 0xd73520 <tables_global>


The crash seems to happen because those events define the print
format like this:

  print fmt: "%s", ""

As far as I can see, this generates only a single TEP_PRINT_ATOM
arg. When processing the event, define_value() tries to convert
cur_field_name to a python string. Since it hasn't seen any
TEP_PRINT_FIELD args, the cur_field_name is still NULL which
triggers the SIGSEGV.

A trivial workaround for the crash would be something like this:

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 80ca5d0..8fd03d5 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -208,6 +208,9 @@ static void define_value(enum tep_print_arg_type field_type,
 	unsigned long long value;
 	unsigned n = 0;
 
+	if (field_name == NULL)
+		return;
+
 	if (field_type == TEP_PRINT_SYMBOL)
 		handler_name = "define_symbolic_value";
 

On the other hand, while perf doesn't crash anymore with that
patch, in a more complex case it would likely still do
something that would not make much sense. As far as I can see,
some old value could easily be left in the file scoped
cur_field_name variable. A more correct fix would require
a bit better understanding of the intended semantics, like
e.g. what exactly is the apparently quite unused
'zero_flag_atom' variable in there supposed to achieve?