lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00fac24c-d664-4ebb-8c60-f4697b7f76c1@linaro.org>
Date: Mon, 5 Aug 2024 14:59:27 +0100
From: James Clark <james.clark@...aro.org>
To: Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>
Cc: acme@...hat.com, coresight@...ts.linaro.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 darren@...amperecomputing.com, scclevenger@...amperecomputing.com,
 james.clark@....com, suzuki.poulose@....com, Leo Yan <leo.yan@....com>,
 Al.Grant@....com, Mike Leach <mike.leach@...aro.org>
Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm
 if address continuity is broken



On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
> 
> 
> On 01-08-2024 03:30 pm, James Clark wrote:
>>
>>
>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>
>>>
>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>
>>>>
>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>> Hi James,
>>>>>>>>
>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>> To generate the instruction tracing, script uses 2 contiguous 
>>>>>>>>>> packets
>>>>>>>>>> address range. If there a continuity brake due to 
>>>>>>>>>> discontiguous branch
>>>>>>>>>> address, it is required to reset the tracing and start tracing 
>>>>>>>>>> with the
>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>
>>>>>>>>>> Adding change to identify the break and complete the remaining 
>>>>>>>>>> tracing
>>>>>>>>>> of current packets and restart tracing from new set of 
>>>>>>>>>> packets, if
>>>>>>>>>> continuity is established.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>
>>>>>>>>> Can you add a before and after example of what's changed to the 
>>>>>>>>> commit message? It wasn't immediately obvious to me if this is 
>>>>>>>>> adding missing output, or it was correcting the tail end of the 
>>>>>>>>> output that was previously wrong.
>>>>>>>>
>>>>>>>> It is adding tail end of the trace as well avoiding the segfault 
>>>>>>>> of the perf application. With out this change the perf segfaults 
>>>>>>>> with as below log
>>>>>>>>
>>>>>>>>
>>>>>>>> ./perf script 
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d 
>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>> objdump: error: the stop address should be after the start address
>>>>>>>> Traceback (most recent call last):
>>>>>>>>    File "./scripts/python/arm-cs-trace-disasm.py", line 271, in 
>>>>>>>> process_event
>>>>>>>>      print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>>>    File "./scripts/python/arm-cs-trace-disasm.py", line 105, in 
>>>>>>>> print_disam
>>>>>>>>      for line in read_disam(dso_fname, dso_start, start_addr, 
>>>>>>>> stop_addr):
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>    File "./scripts/python/arm-cs-trace-disasm.py", line 99, in 
>>>>>>>> read_disam
>>>>>>>>      disasm_output = 
>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>                      ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>    File "/usr/lib64/python3.12/subprocess.py", line 466, in 
>>>>>>>> check_output
>>>>>>>>      return run(*popenargs, stdout=PIPE, timeout=timeout, 
>>>>>>>> check=True,
>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>    File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>>>      raise CalledProcessError(retcode, process.args,
>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z', 
>>>>>>>> '--start-address=0xffff80008125b758', 
>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned 
>>>>>>>> non-zero exit status 1.
>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace 
>>>>>>>> event handler
>>>>>>>> Python runtime state: initialized
>>>>>>>>
>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>    <no Python frame>
>>>>>>>>
>>>>>>>> Extension modules: perf_trace_context, systemd._journal, 
>>>>>>>> systemd._reader, systemd.id128, report._py3report, 
>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>> Aborted (core dumped)
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni 
>>>>>>>>>> <gankulkarni@...amperecomputing.com>
>>>>>>>>>> ---
>>>>>>>>>>   tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 
>>>>>>>>>> ++++++++++
>>>>>>>>>>   1 file changed, 10 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py 
>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>           cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>           return
>>>>>>>>>> +    if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>> +        cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't it 
>>>>>>>>> get overwritten after you load it back into 'prev_ip'
>>>>>>>>
>>>>>>>> No, the logic is same as holding the addr of previous packet.
>>>>>>>> Saving the previous packet saved ip in to prev_ip before 
>>>>>>>> overwriting with the current packet.
>>>>>>>
>>>>>>> It's not exactly the same logic as holding the addr of the 
>>>>>>> previous sample. For addr, we return on the first None, with your 
>>>>>>> change we now "pretend" that the second one is also the previous 
>>>>>>> one:
>>>>>>>
>>>>>>>    if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>      cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>      return  <----------------------------sample 0 return
>>>>>>>
>>>>>>>    if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>        cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no 
>>>>>>> return
>>>>>>>
>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>
>>>>>> Yes, it is dummy for first packet. Added anticipating that we wont 
>>>>>> hit the discontinuity for the first packet itself.
>>>>>>
>>>>>> Can this be changed to more intuitive like below?
>>>>>>
>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py 
>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>                  cpu_data[str(cpu) + 'addr'] = addr
>>>>>>                  return
>>>>>>
>>>>>> +       if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>> +               prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>
>>>>>>          if (options.verbose == True):
>>>>>>                  print("Event type: %s" % name)
>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>
>>>>>>          # Record for previous sample packet
>>>>>>          cpu_data[str(cpu) + 'addr'] = addr
>>>>>> +       cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>
>>>>>>          # Handle CS_ETM_TRACE_ON packet if start_addr=0 and 
>>>>>> stop_addr=4
>>>>>>          if (start_addr == 0 and stop_addr == 4):
>>>>>>                  print("CPU%d: CS_ETM_TRACE_ON packet is inserted" 
>>>>>> % cpu)
>>>>>>                  return
>>>>>>
>>>>>> +       if (stop_addr < start_addr and prev_ip != 0):
>>>>>> +               # Continuity of the Packets broken, set start_addr 
>>>>>> to previous
>>>>>> +               # packet ip to complete the remaining tracing of 
>>>>>> the address range.
>>>>>> +               start_addr = prev_ip
>>>>>> +
>>>>>>          if (start_addr < int(dso_start) or start_addr > 
>>>>>> int(dso_end)):
>>>>>>                  print("Start address 0x%x is out of range [ 0x%x 
>>>>>> .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), 
>>>>>> dso))
>>>>>>                  return
>>>>>>
>>>>>> Without this patch below is the failure log(with segfault) for 
>>>>>> reference.
>>>>>>
>>>>>> [root@...01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 
>>>>>> 1 dd if=/dev/zero of=/dev/null
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>> [root@...01sys-r214 perf]# ./perf script 
>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d 
>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>> objdump: error: the stop address should be after the start address
>>>>>> Traceback (most recent call last):
>>>>>>    File "./scripts/python/arm-cs-trace-disasm.py", line 271, in 
>>>>>> process_event
>>>>>>      print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>>>>    File "./scripts/python/arm-cs-trace-disasm.py", line 105, in 
>>>>>> print_disam
>>>>>>      for line in read_disam(dso_fname, dso_start, start_addr, 
>>>>>> stop_addr):
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>    File "./scripts/python/arm-cs-trace-disasm.py", line 99, in 
>>>>>> read_disam
>>>>>>      disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>>>>                      ^^^^^^^^^^^^^^^^^^^^
>>>>>>    File "/usr/lib64/python3.12/subprocess.py", line 466, in 
>>>>>> check_output
>>>>>>      return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>>>>             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>    File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>>>>      raise CalledProcessError(retcode, process.args,
>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z', 
>>>>>> '--start-address=0xffff80008125b758', 
>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned 
>>>>>> non-zero exit status 1.
>>>>>> Fatal Python error: handler_call_die: problem in Python trace 
>>>>>> event handler
>>>>>> Python runtime state: initialized
>>>>>>
>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>    <no Python frame>
>>>>>>
>>>>>> Extension modules: perf_trace_context, systemd._journal, 
>>>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings, 
>>>>>> problem._py3abrt (total: 7)
>>>>>> Aborted (core dumped)
>>>>>>
>>>>>>
>>>>>> dump snippet:
>>>>>> ============
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>          ffff800080313f04:       36100094        tbz     w20, #2, 
>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>          ffff800080313f08:       f941e6a0        ldr     x0, [x21, 
>>>>>> #968]
>>>>>>          ffff800080313f0c:       d63f0000        blr     x0
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> __perf_event_header__init_id+0x54 
>>>>>> .../coresight/linux/kernel/events/core.c  586         return 
>>>>>> event->clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff80008030cb00 <local_clock>:
>>>>>>          ffff80008030cb00:       d503233f        paciasp
>>>>>>          ffff80008030cb04:       a9bf7bfd        stp     x29, x30, 
>>>>>> [sp, #-16]!
>>>>>>          ffff80008030cb08:       910003fd        mov     x29, sp
>>>>>>          ffff80008030cb0c:       97faba67        bl 
>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h   64 
>>>>>> return sched_clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff8000801bb4a8 <sched_clock>:
>>>>>>          ffff8000801bb4a8:       d503233f        paciasp
>>>>>>          ffff8000801bb4ac:       a9be7bfd        stp     x29, x30, 
>>>>>> [sp, #-32]!
>>>>>>          ffff8000801bb4b0:       910003fd        mov     x29, sp
>>>>>>          ffff8000801bb4b4:       a90153f3        stp     x19, x20, 
>>>>>> [sp, #16]
>>>>>>          ffff8000801bb4b8:       d5384113        mrs     x19, sp_el0
>>>>>>          ffff8000801bb4bc:       b9401260        ldr     w0, [x19, 
>>>>>> #16]
>>>>>>          ffff8000801bb4c0:       11000400        add     w0, w0, #0x1
>>>>>>          ffff8000801bb4c4:       b9001260        str     w0, [x19, 
>>>>>> #16]
>>>>>>          ffff8000801bb4c8:       94427cf8        bl 
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c  105 ns 
>>>>>> = sched_clock_noinstr();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>          ffff80008125a8a8:       d503233f        paciasp
>>>>>>          ffff80008125a8ac:       a9bc7bfd        stp     x29, x30, 
>>>>>> [sp, #-64]!
>>>>>>          ffff80008125a8b0:       910003fd        mov     x29, sp
>>>>>>          ffff80008125a8b4:       a90153f3        stp     x19, x20, 
>>>>>> [sp, #16]
>>>>>>          ffff80008125a8b8:       b000e354        adrp    x20, 
>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>          ffff80008125a8bc:       910d0294        add     x20, x20, 
>>>>>> #0x340
>>>>>>          ffff80008125a8c0:       a90363f7        stp     x23, x24, 
>>>>>> [sp, #48]
>>>>>>          ffff80008125a8c4:       91002297        add     x23, x20, 
>>>>>> #0x8
>>>>>>          ffff80008125a8c8:       52800518        mov     w24, 
>>>>>> #0x28                  // #40
>>>>>>          ffff80008125a8cc:       a9025bf5        stp     x21, x22, 
>>>>>> [sp, #32]
>>>>>>          ffff80008125a8d0:       b9400296        ldr     w22, [x20]
>>>>>>          ffff80008125a8d4:       120002d5        and     w21, w22, 
>>>>>> #0x1
>>>>>>          ffff80008125a8d8:       9bb87eb5        umull   x21, w21, 
>>>>>> w24
>>>>>>          ffff80008125a8dc:       8b1502f3        add     x19, x23, 
>>>>>> x21
>>>>>>          ffff80008125a8e0:       f9400e60        ldr     x0, [x19, 
>>>>>> #24]
>>>>>>          ffff80008125a8e4:       d63f0000        blr     x0
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c 
>>>>>> 93                 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>
>>>>>>
>>>>>> With fix:
>>>>>> =========
>>>>>>
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>>>>>>          ffff800080313f04:       36100094        tbz     w20, #2, 
>>>>>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>          ffff800080313f08:       f941e6a0        ldr     x0, [x21, 
>>>>>> #968]
>>>>>>          ffff800080313f0c:       d63f0000        blr     x0
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> __perf_event_header__init_id+0x54 
>>>>>> .../coresight/linux/kernel/events/core.c  586         return 
>>>>>> event->clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff80008030cb00 <local_clock>:
>>>>>>          ffff80008030cb00:       d503233f        paciasp
>>>>>>          ffff80008030cb04:       a9bf7bfd        stp     x29, x30, 
>>>>>> [sp, #-16]!
>>>>>>          ffff80008030cb08:       910003fd        mov     x29, sp
>>>>>>          ffff80008030cb0c:       97faba67        bl 
>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h   64 
>>>>>> return sched_clock();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff8000801bb4a8 <sched_clock>:
>>>>>>          ffff8000801bb4a8:       d503233f        paciasp
>>>>>>          ffff8000801bb4ac:       a9be7bfd        stp     x29, x30, 
>>>>>> [sp, #-32]!
>>>>>>          ffff8000801bb4b0:       910003fd        mov     x29, sp
>>>>>>          ffff8000801bb4b4:       a90153f3        stp     x19, x20, 
>>>>>> [sp, #16]
>>>>>>          ffff8000801bb4b8:       d5384113        mrs     x19, sp_el0
>>>>>>          ffff8000801bb4bc:       b9401260        ldr     w0, [x19, 
>>>>>> #16]
>>>>>>          ffff8000801bb4c0:       11000400        add     w0, w0, #0x1
>>>>>>          ffff8000801bb4c4:       b9001260        str     w0, [x19, 
>>>>>> #16]
>>>>>>          ffff8000801bb4c8:       94427cf8        bl 
>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c  105 ns 
>>>>>> = sched_clock_noinstr();
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>          ffff80008125a8a8:       d503233f        paciasp
>>>>>>          ffff80008125a8ac:       a9bc7bfd        stp     x29, x30, 
>>>>>> [sp, #-64]!
>>>>>>          ffff80008125a8b0:       910003fd        mov     x29, sp
>>>>>>          ffff80008125a8b4:       a90153f3        stp     x19, x20, 
>>>>>> [sp, #16]
>>>>>>          ffff80008125a8b8:       b000e354        adrp    x20, 
>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>          ffff80008125a8bc:       910d0294        add     x20, x20, 
>>>>>> #0x340
>>>>>>          ffff80008125a8c0:       a90363f7        stp     x23, x24, 
>>>>>> [sp, #48]
>>>>>>          ffff80008125a8c4:       91002297        add     x23, x20, 
>>>>>> #0x8
>>>>>>          ffff80008125a8c8:       52800518        mov     w24, 
>>>>>> #0x28                  // #40
>>>>>>          ffff80008125a8cc:       a9025bf5        stp     x21, x22, 
>>>>>> [sp, #32]
>>>>>>          ffff80008125a8d0:       b9400296        ldr     w22, [x20]
>>>>>>          ffff80008125a8d4:       120002d5        and     w21, w22, 
>>>>>> #0x1
>>>>>>          ffff80008125a8d8:       9bb87eb5        umull   x21, w21, 
>>>>>> w24
>>>>>>          ffff80008125a8dc:       8b1502f3        add     x19, x23, 
>>>>>> x21
>>>>>>          ffff80008125a8e0:       f9400e60        ldr     x0, [x19, 
>>>>>> #24]
>>>>>>          ffff80008125a8e4:       d63f0000        blr     x0
>>>>>
>>>>> It looks like the disassembly now assumes this BLR wasn't taken. We 
>>>>> go from ffff80008125a8e4 straight through to ...
>>>>>
>>>>>>              perf 12720/12720 [0001]      5986.372298040 
>>>>>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c 
>>>>>> 93                 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>> Event type: branches
>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr: 
>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 
>>>>>> period: 1 time: 5986372298040 }
>>>>>>          ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>          ffff80008125a8e8:       f8756ae3        ldr     x3, [x23, 
>>>>>> x21]
>>>>>
>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your 
>>>>> issue actually a decode issue in Perf itself? Why is there a 
>>>>> discontinuity without branch samples being generated where either 
>>>>> the source or destination address is 0?
>>>>>
>>>>> What are your record options to create this issue? As I mentioned 
>>>>> in the previous reply I haven't been able to reproduce it.
>>>>
>>>> I am using below perf record command.
>>>>
>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>
>>>
>>> Thanks I managed to reproduce it. I'll take a look to see if I think 
>>> the issue is somewhere else.
>>>
>>
>> At least for the failures I encountered, the issue is due to the 
>> alternatives runtime instruction patching mechanism. vmlinux ends up 
>> being the wrong image to decode with because a load of branches are 
>> actually turned into nops.
>>
>> Can you confirm if you use --kcore instead of vmlinux that you still 
>> get failures:
>>
>>    sudo perf record -e cs_etm// -C 1 --kcore -o <output-folder.data> -- \
>>      dd if=/dev/zero of=/dev/null
>>
>>     perf script -i <output-folder.data> \
>>      tools/perf/scripts/python/arm-cs-trace-disasm.py -d llvm-objdump \
>>      -k <output-folder.data>/kcore_dir/kcore
>>
> 
> With below command combination with kcore also the issue is seen, as 
> reported in this email chain.
> 
> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
> dd if=/dev/zero of=/dev/null
> 
> ./perf script -i kcore/data \
> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
> -d objdump -k kcore/kcore_dir/kcore
> 
> 
> However, with below sequence(same as your command) the issue is *not* seen.
> 
> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
> dd if=/dev/zero of=/dev/null
> 
> ./perf script -i kcore/data ./scripts/python/arm-cs-trace-disasm.py \
> -- -d objdump -k kcore/kcore_dir/kcore
> 
> Do you see any issue with the command, which is showing the problem?
> Also the output log produced by these both commands is different.
> 

Double check the command I gave. "-i" needs to be the same as "-o" (it's 
the folder, not the data file). I think this could be causing your 
issue. Unless you give it the folder it doesn't open kcore along with 
the data file.

> The below diff that you have shared has no effect on the failing case.
> 
>> But I still think bad decode detection should be moved as much as 
>> possible into OpenCSD and Perf rather than this script. Otherwise 
>> every tool will have to re-implement it, and OpenCSD has a lot more 
>> info to make decisions with.
>>
>> One change we can make is to desynchronize when an N atom is an 
>> unconditional branch:
>>
>>   diff --git a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp 
>> b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> index c557998..3eefd5d 100644
>> --- a/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> +++ b/decoder/source/etmv4/trc_pkt_decode_etmv4i.cpp
>> @@ -1341,6 +1341,14 @@ ocsd_err_t 
>> TrcPktDecodeEtmV4I::processAtom(const ocsd_atm_val atom)
>>           //  save recorded next instuction address
>>           ocsd_vaddr_t nextAddr = m_instr_info.instr_addr;
>>
>> +        // must have lost sync if an unconditional branch wasn't taken
>> +        if (atom == ATOM_N && !m_instr_info.is_conditional) {
>> +             m_need_addr = true;
>> +             m_out_elem.addElemType(m_index_curr_pkt, 
>> OCSD_GEN_TRC_ELEM_NO_SYNC);
>> +             // wait for next address
>> +             return OCSD_OK;
>> +        }
>> +
>>
>> Another one we can spot is when a new address comes that is before the 
>> current decode address (basically the backwards check that you added).
>>
>> There are probably others that can be spotted like an address 
>> appearing after a direct branch that doesn't match the branch target.
>>
>> I think at that point, desynchronising should cause the disassembly 
>> script to throw away the last bit, rather than force it to be printed 
>> as in this patch. As I mentioned above in the thread, it leads to 
>> printing disassembly that's implausible and misleading (where an 
>> unconditional branch wasn't taken).
> 
> Thanks,
> Ganapat

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ