lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d7a6f93-0555-48fa-99cb-bf26b53c2da5@os.amperecomputing.com>
Date: Wed, 7 Aug 2024 17:47:13 +0530
From: Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>
To: James Clark <james.clark@...aro.org>, scclevenger@...amperecomputing.com
Cc: acme@...hat.com, coresight@...ts.linaro.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 darren@...amperecomputing.com, james.clark@....com, suzuki.poulose@....com,
 Leo Yan <leo.yan@....com>, Al.Grant@....com,
 Mike Leach <mike.leach@...aro.org>
Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm
 if address continuity is broken



On 06-08-2024 09:44 pm, James Clark wrote:
> 
> 
> On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>>
>>
>> On 8/6/2024 2:57 AM, James Clark wrote:
>>>
>>>
>>> On 06/08/2024 10:47 am, James Clark wrote:
>>>>
>>>>
>>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>>
>>>>>
>>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>> address
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>     File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>>       print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>>     File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>>       for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>     File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>>>       disasm_output =
>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>>                       ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>     File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>>       return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>     File "/usr/lib64/python3.12/subprocess.py", line 571, 
>>>>>>>>>>>>>> in run
>>>>>>>>>>>>>>       raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>>>     <no Python frame>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>>> <gankulkarni@...amperecomputing.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>    tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>>>    1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>>>            cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>>>            return
>>>>>>>>>>>>>>>> +    if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>>> +        cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>>> previous one:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>>>       cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>       return  <----------------------------sample 0 return
>>>>>>>>>>>>>
>>>>>>>>>>>>>     if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>         cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but
>>>>>>>>>>>>> no return
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>>
>>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>>>                   cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>                   return
>>>>>>>>>>>>
>>>>>>>>>>>> +       if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>>> +               prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>>
>>>>>>>>>>>>           if (options.verbose == True):
>>>>>>>>>>>>                   print("Event type: %s" % name)
>>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>>
>>>>>>>>>>>>           # Record for previous sample packet
>>>>>>>>>>>>           cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>> +       cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>>
>>>>>>>>>>>>           # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>>> stop_addr=4
>>>>>>>>>>>>           if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>>>                   print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>>>                   return
>>>>>>>>>>>>
>>>>>>>>>>>> +       if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>>> +               # Continuity of the Packets broken, set
>>>>>>>>>>>> start_addr to previous
>>>>>>>>>>>> +               # packet ip to complete the remaining tracing
>>>>>>>>>>>> of the address range.
>>>>>>>>>>>> +               start_addr = prev_ip
>>>>>>>>>>>> +
>>>>>>>>>>>>           if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>>>                   print("Start address 0x%x is out of range [
>>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>>>                   return
>>>>>>>>>>>>
>>>>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>>>>> reference.
>>>>>>>>>>>>
>>>>>>>>>>>> [root@...01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>>> [root@...01sys-r214 perf]# ./perf script
>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>> address
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>     File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>> in process_event
>>>>>>>>>>>>       print_disam(dso_fname, dso_vm_start, start_addr, 
>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>     File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>       for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>>>> stop_addr):
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>     File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>>>>>>>>>> read_disam
>>>>>>>>>>>>       disasm_output =
>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>                       ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>     File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>> check_output
>>>>>>>>>>>>       return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>> check=True,
>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>     File "/usr/lib64/python3.12/subprocess.py", line 571, in 
>>>>>>>>>>>> run
>>>>>>>>>>>>       raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>>> event handler
>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>
>>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>>>     <no Python frame>
>>>>>>>>>>>>
>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> dump snippet:
>>>>>>>>>>>> ============
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>           ffff800080313f04:       36100094        tbz     w20,
>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>           ffff800080313f08:       f941e6a0        ldr     x0,
>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>           ffff800080313f0c:       d63f0000        blr     x0
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c  586         return
>>>>>>>>>>>> event->clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>           ffff80008030cb00:       d503233f        paciasp
>>>>>>>>>>>>           ffff80008030cb04:       a9bf7bfd        stp     x29,
>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>           ffff80008030cb08:       910003fd        mov     
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>>           ffff80008030cb0c:       97faba67        bl
>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h   64
>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>           ffff8000801bb4a8:       d503233f        paciasp
>>>>>>>>>>>>           ffff8000801bb4ac:       a9be7bfd        stp     x29,
>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>           ffff8000801bb4b0:       910003fd        mov     
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>>           ffff8000801bb4b4:       a90153f3        stp     x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>           ffff8000801bb4b8:       d5384113        mrs     x19,
>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>           ffff8000801bb4bc:       b9401260        ldr     w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>           ffff8000801bb4c0:       11000400        add     w0,
>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>           ffff8000801bb4c4:       b9001260        str     w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>           ffff8000801bb4c8:       94427cf8        bl
>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c  105
>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>           ffff80008125a8a8:       d503233f        paciasp
>>>>>>>>>>>>           ffff80008125a8ac:       a9bc7bfd        stp     x29,
>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>           ffff80008125a8b0:       910003fd        mov     
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>>           ffff80008125a8b4:       a90153f3        stp     x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>           ffff80008125a8b8:       b000e354        adrp    x20,
>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>           ffff80008125a8bc:       910d0294        add     x20,
>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>           ffff80008125a8c0:       a90363f7        stp     x23,
>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>           ffff80008125a8c4:       91002297        add     x23,
>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>           ffff80008125a8c8:       52800518        mov     w24,
>>>>>>>>>>>> #0x28                  // #40
>>>>>>>>>>>>           ffff80008125a8cc:       a9025bf5        stp     x21,
>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>           ffff80008125a8d0:       b9400296        ldr     w22,
>>>>>>>>>>>> [x20]
>>>>>>>>>>>>           ffff80008125a8d4:       120002d5        and     w21,
>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>           ffff80008125a8d8:       9bb87eb5        umull   x21,
>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>           ffff80008125a8dc:       8b1502f3        add     x19,
>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>           ffff80008125a8e0:       f9400e60        ldr     x0,
>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>           ffff80008125a8e4:       d63f0000        blr     x0
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> With fix:
>>>>>>>>>>>> =========
>>>>>>>>>>>>
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>           ffff800080313f04:       36100094        tbz     w20,
>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>           ffff800080313f08:       f941e6a0        ldr     x0,
>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>           ffff800080313f0c:       d63f0000        blr     x0
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c  586         return
>>>>>>>>>>>> event->clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>           ffff80008030cb00:       d503233f        paciasp
>>>>>>>>>>>>           ffff80008030cb04:       a9bf7bfd        stp     x29,
>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>           ffff80008030cb08:       910003fd        mov     
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>>           ffff80008030cb0c:       97faba67        bl
>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h   64
>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>           ffff8000801bb4a8:       d503233f        paciasp
>>>>>>>>>>>>           ffff8000801bb4ac:       a9be7bfd        stp     x29,
>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>           ffff8000801bb4b0:       910003fd        mov     
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>>           ffff8000801bb4b4:       a90153f3        stp     x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>           ffff8000801bb4b8:       d5384113        mrs     x19,
>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>           ffff8000801bb4bc:       b9401260        ldr     w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>           ffff8000801bb4c0:       11000400        add     w0,
>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>           ffff8000801bb4c4:       b9001260        str     w0,
>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>           ffff8000801bb4c8:       94427cf8        bl
>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c  105
>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>           ffff80008125a8a8:       d503233f        paciasp
>>>>>>>>>>>>           ffff80008125a8ac:       a9bc7bfd        stp     x29,
>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>           ffff80008125a8b0:       910003fd        mov     
>>>>>>>>>>>> x29, sp
>>>>>>>>>>>>           ffff80008125a8b4:       a90153f3        stp     x19,
>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>           ffff80008125a8b8:       b000e354        adrp    x20,
>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>           ffff80008125a8bc:       910d0294        add     x20,
>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>           ffff80008125a8c0:       a90363f7        stp     x23,
>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>           ffff80008125a8c4:       91002297        add     x23,
>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>           ffff80008125a8c8:       52800518        mov     w24,
>>>>>>>>>>>> #0x28                  // #40
>>>>>>>>>>>>           ffff80008125a8cc:       a9025bf5        stp     x21,
>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>           ffff80008125a8d0:       b9400296        ldr     w22,
>>>>>>>>>>>> [x20]
>>>>>>>>>>>>           ffff80008125a8d4:       120002d5        and     w21,
>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>           ffff80008125a8d8:       9bb87eb5        umull   x21,
>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>           ffff80008125a8dc:       8b1502f3        add     x19,
>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>           ffff80008125a8e0:       f9400e60        ldr     x0,
>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>           ffff80008125a8e4:       d63f0000        blr     x0
>>>>>>>>>>>
>>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>>
>>>>>>>>>>>>               perf 12720/12720 [0001]      5986.372298040
>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>           ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>>>           ffff80008125a8e8:       f8756ae3        ldr     x3,
>>>>>>>>>>>> [x23, x21]
>>>>>>>>>>>
>>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>>
>>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>>> reproduce it.
>>>>>>>>>>
>>>>>>>>>> I am using below perf record command.
>>>>>>>>>>
>>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>>> of=/dev/null
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>>> think the issue is somewhere else.
>>>>>>>>>
>>>>>>>>
>>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>>> up being the wrong image to decode with because a load of branches
>>>>>>>> are actually turned into nops.
>>>>>>>>
>>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>>> still get failures:
>>>>>>>>
>>>>>>>>     sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>>> <output-folder.data> -- \
>>>>>>>>       dd if=/dev/zero of=/dev/null
>>>>>>>>
>>>>>>>>      perf script -i <output-folder.data> \
>>>>>>>>       tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>>> llvm-objdump \
>>>>>>>>       -k <output-folder.data>/kcore_dir/kcore
>>>>>>>>
>>>>>>>
>>>>>>> With below command combination with kcore also the issue is seen,
>>>>>>> as reported in this email chain.
>>>>>>>
>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>
>>>>>>> ./perf script -i kcore/data \
>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>>
>>>>>>>
>>>>>>> However, with below sequence(same as your command) the issue is
>>>>>>> *not* seen.
>>>>>>>
>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>
>>>>>>> ./perf script -i kcore/data 
>>>>>>> ./scripts/python/arm-cs-trace-disasm.py \
>>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>>
>>>>>>> Do you see any issue with the command, which is showing the problem?
>>>>>>> Also the output log produced by these both commands is different.
>>>>>>>
>>>>
>>>> BTW are you running this on the target or somewhere else? It's
>>>> suspicious that "-i kcore/data" works at all because there is no
>>>> kernel image given to Perf. Unless you are running on the target and
>>>> then I think it will just open the one from /proc. Or maybe it uses
>>>> /boot/vmlinux by default which also wouldn't work.
>>>>

Yes All tests are done natively on Ampere's ARM64 platform.
some more combination of commands which are also failing.

./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -- -d 
objdump -k kcore/kcore_dir/kcore

./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F 
cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore

  ./perf script -i ./kcore scripts/python/arm-cs-trace-disasm.py -d 
objdump -k kcore/kcore_dir/kcore

>>>> Also the difference between "--script=python:" and just giving the
>>>> script name is in the parsing of the arguments following " -- ".
>>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>>
>>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>>> when it's not. But there are some other combinations and you'll have
>>>> to debug it to compare your two exact scenarios to see why they're
>>>> different.
>>>>
>>>> But ignoring that issue with the argument format, you mentioned you
>>>> didn't see the issue any more with one version of --kcore. So I'm
>>>> assuming that confirms the issue is just a decode image issue, so we
>>>> shouldn't try to patch this script?
>>>>
>>>
>>> Although one change we should make to the script is change the example
>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>> isn't available, but add a note that it will fail if any patched code is
>>> traced (which is almost always).
>>
>> James, you may recall the year old thread
>> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
>> I described there an awkward workaround Ampere has used to solve the
>> patched code problem. At the time, it sounded like the maintainers were
>> interested in getting away from using the python script, mostly for
>> speed purposes. I didn't see the discussion go any further.
>>
> 
> Oh yes thanks for the reminder. I wasn't thinking about the source code 
> lines and debug symbols in this thread. I suppose your merging of kcore 
> and vmlinux gives both the correct image and the symbols, but I was only 
> focused on the image being correct, so only kcore was enough.
> 
> It looks like everything we want to do from your previous thread is in 
> addition to the fixes from this one. Even if we auto merge kcore + 
> symbols and move the disassembly into Perf, we still want to detect 
> decode issues earlier and not have IPs jumping backwards. Whether it's 
> the script or Perf doing that the behavior should be the same.
> 

Since it is breaking the decode, can we please add as a fix to drop the 
packets from decode when the discontinuity is seen (with warning message 
in verbose mode)? like below diff?

--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -257,6 +257,11 @@ def process_event(param_dict):
                 print("Stop address 0x%x is out of range [ 0x%x .. 0x%x 
] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
                 return

+       if (stop_addr < start_addr):
+               if (options.verbose == True):
+                       print("Packet Dropped, Discontinuity detected 
[stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
+               return
+
         if (options.objdump_name != None):
                 # It doesn't need to decrease virtual memory offset for 
disassembly
                 # for kernel dso and executable file dso, so in this 
case we set


> To summarise I think these are the changes to make:
> 
>    * Improve bad decode detection in OpenCSD
>    * Get the script to auto merge kcore and vmlinux
>      * Maybe we could get Perf to do this if both a kcore folder and -k
>        vmlinux are used?
>    * Improve the performance, either in the script or move more
>      functionality into Perf

Thanks,
Ganapat

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ