[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6170beb-754e-4be3-8ff7-18acddccf077@linaro.org>
Date: Wed, 7 Aug 2024 15:53:57 +0100
From: James Clark <james.clark@...aro.org>
To: Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>,
scclevenger@...amperecomputing.com
Cc: acme@...hat.com, coresight@...ts.linaro.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
darren@...amperecomputing.com, james.clark@....com, suzuki.poulose@....com,
Leo Yan <leo.yan@....com>, Al.Grant@....com,
Mike Leach <mike.leach@...aro.org>
Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm
if address continuity is broken
On 07/08/2024 1:17 pm, Ganapatrao Kulkarni wrote:
>
>
> On 06-08-2024 09:44 pm, James Clark wrote:
>>
>>
>> On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>>>
>>>
>>> On 8/6/2024 2:57 AM, James Clark wrote:
>>>>
>>>>
>>>> On 06/08/2024 10:47 am, James Clark wrote:
>>>>>
>>>>>
>>>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>>
>>>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you add a before and after example of what's changed to
>>>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>> 271,
>>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>> 105,
>>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99,
>>>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>>>> in run
>>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>>>> <gankulkarni@...amperecomputing.com>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you need to write into the global cpu_data here? Doesn't
>>>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, the logic is same as holding the addr of previous
>>>>>>>>>>>>>>> packet.
>>>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>>>> previous one:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> no return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>> return
>>>>>>>>>>>>>
>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>>>> stop_addr=4
>>>>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>>>> return
>>>>>>>>>>>>>
>>>>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>>>>> start_addr to previous
>>>>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>>>>> of the address range.
>>>>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>>>>> +
>>>>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>>>> return
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without this patch below is the failure log(with segfault) for
>>>>>>>>>>>>> reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root@...01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>>>> [root@...01sys-r214 perf]# ./perf script
>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>> address
>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>>>>>>>>>>> stop_addr):
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>> 99, in
>>>>>>>>>>>>> read_disam
>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>> check_output
>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>> in run
>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>>>> event handler
>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>
>>>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> dump snippet:
>>>>>>>>>>>>> ============
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> With fix:
>>>>>>>>>>>>> =========
>>>>>>>>>>>>>
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz w20,
>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs x19,
>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105
>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp x29,
>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp x19,
>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp x20,
>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add x20,
>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp x23,
>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add x23,
>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov w24,
>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp x21,
>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr w22,
>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and w21,
>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull x21,
>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add x19,
>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>>>
>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>>>>> [x23, x21]
>>>>>>>>>>>>
>>>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>>>
>>>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>>>> reproduce it.
>>>>>>>>>>>
>>>>>>>>>>> I am using below perf record command.
>>>>>>>>>>>
>>>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>>>> of=/dev/null
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>>>> think the issue is somewhere else.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>>>> up being the wrong image to decode with because a load of branches
>>>>>>>>> are actually turned into nops.
>>>>>>>>>
>>>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>>>> still get failures:
>>>>>>>>>
>>>>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>>>> <output-folder.data> -- \
>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>
>>>>>>>>> perf script -i <output-folder.data> \
>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>>>> llvm-objdump \
>>>>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>>>>
>>>>>>>>
>>>>>>>> With below command combination with kcore also the issue is seen,
>>>>>>>> as reported in this email chain.
>>>>>>>>
>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>
>>>>>>>> ./perf script -i kcore/data \
>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>
>>>>>>>>
>>>>>>>> However, with below sequence(same as your command) the issue is
>>>>>>>> *not* seen.
>>>>>>>>
>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>
>>>>>>>> ./perf script -i kcore/data
>>>>>>>> ./scripts/python/arm-cs-trace-disasm.py \
>>>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>
>>>>>>>> Do you see any issue with the command, which is showing the
>>>>>>>> problem?
>>>>>>>> Also the output log produced by these both commands is different.
>>>>>>>>
>>>>>
>>>>> BTW are you running this on the target or somewhere else? It's
>>>>> suspicious that "-i kcore/data" works at all because there is no
>>>>> kernel image given to Perf. Unless you are running on the target and
>>>>> then I think it will just open the one from /proc. Or maybe it uses
>>>>> /boot/vmlinux by default which also wouldn't work.
>>>>>
>
> Yes All tests are done natively on Ampere's ARM64 platform.
> some more combination of commands which are also failing.
>
> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -- -d
> objdump -k kcore/kcore_dir/kcore
>
> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F
> cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore
>
> ./perf script -i ./kcore scripts/python/arm-cs-trace-disasm.py -d
> objdump -k kcore/kcore_dir/kcore
>
>>>>> Also the difference between "--script=python:" and just giving the
>>>>> script name is in the parsing of the arguments following " -- ".
>>>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>>>
>>>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>>>> when it's not. But there are some other combinations and you'll have
>>>>> to debug it to compare your two exact scenarios to see why they're
>>>>> different.
>>>>>
>>>>> But ignoring that issue with the argument format, you mentioned you
>>>>> didn't see the issue any more with one version of --kcore. So I'm
>>>>> assuming that confirms the issue is just a decode image issue, so we
>>>>> shouldn't try to patch this script?
>>>>>
>>>>
>>>> Although one change we should make to the script is change the example
>>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>>> isn't available, but add a note that it will fail if any patched
>>>> code is
>>>> traced (which is almost always).
>>>
>>> James, you may recall the year old thread
>>> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
>>> I described there an awkward workaround Ampere has used to solve the
>>> patched code problem. At the time, it sounded like the maintainers were
>>> interested in getting away from using the python script, mostly for
>>> speed purposes. I didn't see the discussion go any further.
>>>
>>
>> Oh yes thanks for the reminder. I wasn't thinking about the source
>> code lines and debug symbols in this thread. I suppose your merging of
>> kcore and vmlinux gives both the correct image and the symbols, but I
>> was only focused on the image being correct, so only kcore was enough.
>>
>> It looks like everything we want to do from your previous thread is in
>> addition to the fixes from this one. Even if we auto merge kcore +
>> symbols and move the disassembly into Perf, we still want to detect
>> decode issues earlier and not have IPs jumping backwards. Whether it's
>> the script or Perf doing that the behavior should be the same.
>>
>
> Since it is breaking the decode, can we please add as a fix to drop the
> packets from decode when the discontinuity is seen (with warning message
> in verbose mode)? like below diff?
>
> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> @@ -257,6 +257,11 @@ def process_event(param_dict):
> print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
> ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
> return
>
> + if (stop_addr < start_addr):
> + if (options.verbose == True):
> + print("Packet Dropped, Discontinuity detected
> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> dso))
> + return
> +
I suppose my only concern with this is that it hides real errors and
Perf shouldn't be outputting samples that go backwards. Considering that
fixing this in OpenCSD and Perf has a much wider benefit I think that
should be the ultimate goal. I'm putting this on my todo list for now
(including Steve's merging idea).
But in the mean time what about having a force option?
> + if (stop_addr < start_addr):
> + if (options.verbose == True or not options.force):
> + print("Packet Dropped, Discontinuity detected
> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> dso))
> + if (not options.force):
> + return
> +
That way in the future we'll still get reports if something new goes wrong.
> if (options.objdump_name != None):
> # It doesn't need to decrease virtual memory offset for
> disassembly
> # for kernel dso and executable file dso, so in this
> case we set
>
>
>> To summarise I think these are the changes to make:
>>
>> * Improve bad decode detection in OpenCSD
>> * Get the script to auto merge kcore and vmlinux
>> * Maybe we could get Perf to do this if both a kcore folder and -k
>> vmlinux are used?
>> * Improve the performance, either in the script or move more
>> functionality into Perf
>
> Thanks,
> Ganapat
Powered by blists - more mailing lists