[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ba157c2-4a56-4d77-9a15-071e46adc33b@os.amperecomputing.com>
Date: Wed, 7 Aug 2024 21:48:31 +0530
From: Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>
To: James Clark <james.clark@...aro.org>, scclevenger@...amperecomputing.com
Cc: acme@...hat.com, coresight@...ts.linaro.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
darren@...amperecomputing.com, james.clark@....com, suzuki.poulose@....com,
Leo Yan <leo.yan@....com>, Al.Grant@....com,
Mike Leach <mike.leach@...aro.org>
Subject: Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm
if address continuity is broken
On 07-08-2024 08:23 pm, James Clark wrote:
>
>
> On 07/08/2024 1:17 pm, Ganapatrao Kulkarni wrote:
>>
>>
>> On 06-08-2024 09:44 pm, James Clark wrote:
>>>
>>>
>>> On 06/08/2024 4:02 pm, Steve Clevenger wrote:
>>>>
>>>>
>>>> On 8/6/2024 2:57 AM, James Clark wrote:
>>>>>
>>>>>
>>>>> On 06/08/2024 10:47 am, James Clark wrote:
>>>>>>
>>>>>>
>>>>>> On 06/08/2024 8:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 05-08-2024 07:29 pm, James Clark wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/08/2024 1:22 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01-08-2024 03:30 pm, James Clark wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24/07/2024 3:45 pm, James Clark wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 24/07/2024 7:38 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 23-07-2024 09:16 pm, James Clark wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi James,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>>>>>>>>>>>>>> To generate the instruction tracing, script uses 2
>>>>>>>>>>>>>>>>>> contiguous packets
>>>>>>>>>>>>>>>>>> address range. If there a continuity brake due to
>>>>>>>>>>>>>>>>>> discontiguous branch
>>>>>>>>>>>>>>>>>> address, it is required to reset the tracing and start
>>>>>>>>>>>>>>>>>> tracing with the
>>>>>>>>>>>>>>>>>> new set of contiguous packets.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Adding change to identify the break and complete the
>>>>>>>>>>>>>>>>>> remaining tracing
>>>>>>>>>>>>>>>>>> of current packets and restart tracing from new set of
>>>>>>>>>>>>>>>>>> packets, if
>>>>>>>>>>>>>>>>>> continuity is established.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Ganapatrao,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you add a before and after example of what's
>>>>>>>>>>>>>>>>> changed to
>>>>>>>>>>>>>>>>> the commit message? It wasn't immediately obvious to me if
>>>>>>>>>>>>>>>>> this is adding missing output, or it was correcting the
>>>>>>>>>>>>>>>>> tail end of the output that was previously wrong.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is adding tail end of the trace as well avoiding the
>>>>>>>>>>>>>>>> segfault of the perf application. With out this change the
>>>>>>>>>>>>>>>> perf segfaults with as below log
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ./perf script
>>>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py --
>>>>>>>>>>>>>>>> -d objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>>> 271,
>>>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>>> 105,
>>>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>>>> start_addr, stop_addr):
>>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>>>> 99,
>>>>>>>>>>>>>>>> in read_disam
>>>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line
>>>>>>>>>>>>>>>> 466, in
>>>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line
>>>>>>>>>>>>>>>> 571, in run
>>>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python
>>>>>>>>>>>>>>>> trace event handler
>>>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>>>>>>>>>>>>>> <gankulkarni@...amperecomputing.com>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10
>>>>>>>>>>>>>>>>>> ++++++++++
>>>>>>>>>>>>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>> a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you need to write into the global cpu_data here?
>>>>>>>>>>>>>>>>> Doesn't
>>>>>>>>>>>>>>>>> it get overwritten after you load it back into 'prev_ip'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, the logic is same as holding the addr of previous
>>>>>>>>>>>>>>>> packet.
>>>>>>>>>>>>>>>> Saving the previous packet saved ip in to prev_ip before
>>>>>>>>>>>>>>>> overwriting with the current packet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It's not exactly the same logic as holding the addr of the
>>>>>>>>>>>>>>> previous sample. For addr, we return on the first None, with
>>>>>>>>>>>>>>> your change we now "pretend" that the second one is also the
>>>>>>>>>>>>>>> previous one:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>>> return <----------------------------sample 0 return
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>>>>>>>>>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1
>>>>>>>>>>>>>>> save but
>>>>>>>>>>>>>>> no return
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then for sample 1 'prev_ip' is actually now the 'current'
>>>>>>>>>>>>>>> IP:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, it is dummy for first packet. Added anticipating that we
>>>>>>>>>>>>>> wont hit the discontinuity for the first packet itself.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can this be changed to more intuitive like below?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> index d973c2baed1c..d49f5090059f 100755
>>>>>>>>>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>>>>>>>>>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>>>>>>>>>>>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (options.verbose == True):
>>>>>>>>>>>>>> print("Event type: %s" % name)
>>>>>>>>>>>>>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Record for previous sample packet
>>>>>>>>>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>>>>>>>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and
>>>>>>>>>>>>>> stop_addr=4
>>>>>>>>>>>>>> if (start_addr == 0 and stop_addr == 4):
>>>>>>>>>>>>>> print("CPU%d: CS_ETM_TRACE_ON packet is
>>>>>>>>>>>>>> inserted" % cpu)
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + if (stop_addr < start_addr and prev_ip != 0):
>>>>>>>>>>>>>> + # Continuity of the Packets broken, set
>>>>>>>>>>>>>> start_addr to previous
>>>>>>>>>>>>>> + # packet ip to complete the remaining tracing
>>>>>>>>>>>>>> of the address range.
>>>>>>>>>>>>>> + start_addr = prev_ip
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> if (start_addr < int(dso_start) or start_addr >
>>>>>>>>>>>>>> int(dso_end)):
>>>>>>>>>>>>>> print("Start address 0x%x is out of range [
>>>>>>>>>>>>>> 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start),
>>>>>>>>>>>>>> int(dso_end), dso))
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without this patch below is the failure log(with segfault)
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> reference.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root@...01sys-r214 perf]# timeout 4s ./perf record -e
>>>>>>>>>>>>>> cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>>>>>>>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>>>>>>>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>>>>>>>>>>>>>> [root@...01sys-r214 perf]# ./perf script
>>>>>>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>>>>>>>>>>>> objdump -k ../../vmlinux -v $* > dump
>>>>>>>>>>>>>> objdump: error: the stop address should be after the start
>>>>>>>>>>>>>> address
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271,
>>>>>>>>>>>>>> in process_event
>>>>>>>>>>>>>> print_disam(dso_fname, dso_vm_start, start_addr,
>>>>>>>>>>>>>> stop_addr)
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105,
>>>>>>>>>>>>>> in print_disam
>>>>>>>>>>>>>> for line in read_disam(dso_fname, dso_start,
>>>>>>>>>>>>>> start_addr,
>>>>>>>>>>>>>> stop_addr):
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "./scripts/python/arm-cs-trace-disasm.py", line
>>>>>>>>>>>>>> 99, in
>>>>>>>>>>>>>> read_disam
>>>>>>>>>>>>>> disasm_output =
>>>>>>>>>>>>>> check_output(disasm).decode('utf-8').split('\n')
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>>>>>>>>>>>> check_output
>>>>>>>>>>>>>> return run(*popenargs, stdout=PIPE, timeout=timeout,
>>>>>>>>>>>>>> check=True,
>>>>>>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>>>>> File "/usr/lib64/python3.12/subprocess.py", line 571,
>>>>>>>>>>>>>> in run
>>>>>>>>>>>>>> raise CalledProcessError(retcode, process.args,
>>>>>>>>>>>>>> subprocess.CalledProcessError: Command '['objdump', '-d',
>>>>>>>>>>>>>> '-z', '--start-address=0xffff80008125b758',
>>>>>>>>>>>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']'
>>>>>>>>>>>>>> returned non-zero exit status 1.
>>>>>>>>>>>>>> Fatal Python error: handler_call_die: problem in Python trace
>>>>>>>>>>>>>> event handler
>>>>>>>>>>>>>> Python runtime state: initialized
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>>>>>>>>>>>>>> <no Python frame>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Extension modules: perf_trace_context, systemd._journal,
>>>>>>>>>>>>>> systemd._reader, systemd.id128, report._py3report,
>>>>>>>>>>>>>> _dbus_bindings, problem._py3abrt (total: 7)
>>>>>>>>>>>>>> Aborted (core dumped)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> dump snippet:
>>>>>>>>>>>>>> ============
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz
>>>>>>>>>>>>>> w20,
>>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c
>>>>>>>>>>>>>> 105
>>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov
>>>>>>>>>>>>>> w24,
>>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr
>>>>>>>>>>>>>> w22,
>>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and
>>>>>>>>>>>>>> w21,
>>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With fix:
>>>>>>>>>>>>>> =========
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff800080313f04
>>>>>>>>>>>>>> <__perf_event_header__init_id+0x4c>:
>>>>>>>>>>>>>> ffff800080313f04: 36100094 tbz
>>>>>>>>>>>>>> w20,
>>>>>>>>>>>>>> #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
>>>>>>>>>>>>>> ffff800080313f08: f941e6a0 ldr x0,
>>>>>>>>>>>>>> [x21, #968]
>>>>>>>>>>>>>> ffff800080313f0c: d63f0000 blr x0
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> __perf_event_header__init_id+0x54
>>>>>>>>>>>>>> .../coresight/linux/kernel/events/core.c 586 return
>>>>>>>>>>>>>> event->clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008030cb00 <local_clock>:
>>>>>>>>>>>>>> ffff80008030cb00: d503233f paciasp
>>>>>>>>>>>>>> ffff80008030cb04: a9bf7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-16]!
>>>>>>>>>>>>>> ffff80008030cb08: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008030cb0c: 97faba67 bl
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>>>>>>>>>>>>>> return sched_clock();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff8000801bb4a8 <sched_clock>:
>>>>>>>>>>>>>> ffff8000801bb4a8: d503233f paciasp
>>>>>>>>>>>>>> ffff8000801bb4ac: a9be7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-32]!
>>>>>>>>>>>>>> ffff8000801bb4b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff8000801bb4b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff8000801bb4b8: d5384113 mrs
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> sp_el0
>>>>>>>>>>>>>> ffff8000801bb4bc: b9401260 ldr w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c0: 11000400 add w0,
>>>>>>>>>>>>>> w0, #0x1
>>>>>>>>>>>>>> ffff8000801bb4c4: b9001260 str w0,
>>>>>>>>>>>>>> [x19, #16]
>>>>>>>>>>>>>> ffff8000801bb4c8: 94427cf8 bl
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c
>>>>>>>>>>>>>> 105
>>>>>>>>>>>>>> ns = sched_clock_noinstr();
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008125a8a8 <sched_clock_noinstr>:
>>>>>>>>>>>>>> ffff80008125a8a8: d503233f paciasp
>>>>>>>>>>>>>> ffff80008125a8ac: a9bc7bfd stp
>>>>>>>>>>>>>> x29,
>>>>>>>>>>>>>> x30, [sp, #-64]!
>>>>>>>>>>>>>> ffff80008125a8b0: 910003fd mov x29, sp
>>>>>>>>>>>>>> ffff80008125a8b4: a90153f3 stp
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x20, [sp, #16]
>>>>>>>>>>>>>> ffff80008125a8b8: b000e354 adrp
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> ffff800082ec3000 <tick_bc_dev+0x140>
>>>>>>>>>>>>>> ffff80008125a8bc: 910d0294 add
>>>>>>>>>>>>>> x20,
>>>>>>>>>>>>>> x20, #0x340
>>>>>>>>>>>>>> ffff80008125a8c0: a90363f7 stp
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x24, [sp, #48]
>>>>>>>>>>>>>> ffff80008125a8c4: 91002297 add
>>>>>>>>>>>>>> x23,
>>>>>>>>>>>>>> x20, #0x8
>>>>>>>>>>>>>> ffff80008125a8c8: 52800518 mov
>>>>>>>>>>>>>> w24,
>>>>>>>>>>>>>> #0x28 // #40
>>>>>>>>>>>>>> ffff80008125a8cc: a9025bf5 stp
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> x22, [sp, #32]
>>>>>>>>>>>>>> ffff80008125a8d0: b9400296 ldr
>>>>>>>>>>>>>> w22,
>>>>>>>>>>>>>> [x20]
>>>>>>>>>>>>>> ffff80008125a8d4: 120002d5 and
>>>>>>>>>>>>>> w21,
>>>>>>>>>>>>>> w22, #0x1
>>>>>>>>>>>>>> ffff80008125a8d8: 9bb87eb5 umull
>>>>>>>>>>>>>> x21,
>>>>>>>>>>>>>> w21, w24
>>>>>>>>>>>>>> ffff80008125a8dc: 8b1502f3 add
>>>>>>>>>>>>>> x19,
>>>>>>>>>>>>>> x23, x21
>>>>>>>>>>>>>> ffff80008125a8e0: f9400e60 ldr x0,
>>>>>>>>>>>>>> [x19, #24]
>>>>>>>>>>>>>> ffff80008125a8e4: d63f0000 blr x0
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like the disassembly now assumes this BLR wasn't
>>>>>>>>>>>>> taken. We go from ffff80008125a8e4 straight through to ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>> perf 12720/12720 [0001] 5986.372298040
>>>>>>>>>>>>>> sched_clock_noinstr+0x3c
>>>>>>>>>>>>>> ...sight/linux/kernel/time/sched_clock.c 93
>>>>>>>>>>>>>> cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>>>>>>>>>>>>>> Event type: branches
>>>>>>>>>>>>>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>>>>>>>>>>>>>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid:
>>>>>>>>>>>>>> 12720 period: 1 time: 5986372298040 }
>>>>>>>>>>>>>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>>>>>>>>>>>>>> ffff80008125a8e8: f8756ae3 ldr x3,
>>>>>>>>>>>>>> [x23, x21]
>>>>>>>>>>>>>
>>>>>>>>>>>>> ffff80008125a8e4 which is just the previous one +4. Isn't your
>>>>>>>>>>>>> issue actually a decode issue in Perf itself? Why is there a
>>>>>>>>>>>>> discontinuity without branch samples being generated where
>>>>>>>>>>>>> either the source or destination address is 0?
>>>>>>>>>>>>>
>>>>>>>>>>>>> What are your record options to create this issue? As I
>>>>>>>>>>>>> mentioned in the previous reply I haven't been able to
>>>>>>>>>>>>> reproduce it.
>>>>>>>>>>>>
>>>>>>>>>>>> I am using below perf record command.
>>>>>>>>>>>>
>>>>>>>>>>>> timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero
>>>>>>>>>>>> of=/dev/null
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks I managed to reproduce it. I'll take a look to see if I
>>>>>>>>>>> think the issue is somewhere else.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> At least for the failures I encountered, the issue is due to the
>>>>>>>>>> alternatives runtime instruction patching mechanism. vmlinux ends
>>>>>>>>>> up being the wrong image to decode with because a load of
>>>>>>>>>> branches
>>>>>>>>>> are actually turned into nops.
>>>>>>>>>>
>>>>>>>>>> Can you confirm if you use --kcore instead of vmlinux that you
>>>>>>>>>> still get failures:
>>>>>>>>>>
>>>>>>>>>> sudo perf record -e cs_etm// -C 1 --kcore -o
>>>>>>>>>> <output-folder.data> -- \
>>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>>
>>>>>>>>>> perf script -i <output-folder.data> \
>>>>>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py -d
>>>>>>>>>> llvm-objdump \
>>>>>>>>>> -k <output-folder.data>/kcore_dir/kcore
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With below command combination with kcore also the issue is seen,
>>>>>>>>> as reported in this email chain.
>>>>>>>>>
>>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>
>>>>>>>>> ./perf script -i kcore/data \
>>>>>>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- \
>>>>>>>>> -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> However, with below sequence(same as your command) the issue is
>>>>>>>>> *not* seen.
>>>>>>>>>
>>>>>>>>> timeout 8s ./perf record -e cs_etm// -C 1 --kcore -o kcore \
>>>>>>>>> dd if=/dev/zero of=/dev/null
>>>>>>>>>
>>>>>>>>> ./perf script -i kcore/data
>>>>>>>>> ./scripts/python/arm-cs-trace-disasm.py \
>>>>>>>>> -- -d objdump -k kcore/kcore_dir/kcore
>>>>>>>>>
>>>>>>>>> Do you see any issue with the command, which is showing the
>>>>>>>>> problem?
>>>>>>>>> Also the output log produced by these both commands is different.
>>>>>>>>>
>>>>>>
>>>>>> BTW are you running this on the target or somewhere else? It's
>>>>>> suspicious that "-i kcore/data" works at all because there is no
>>>>>> kernel image given to Perf. Unless you are running on the target and
>>>>>> then I think it will just open the one from /proc. Or maybe it uses
>>>>>> /boot/vmlinux by default which also wouldn't work.
>>>>>>
>>
>> Yes All tests are done natively on Ampere's ARM64 platform.
>> some more combination of commands which are also failing.
>>
>> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py --
>> -d objdump -k kcore/kcore_dir/kcore
>>
>> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F
>> cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore
>>
>> ./perf script -i ./kcore scripts/python/arm-cs-trace-disasm.py -d
>> objdump -k kcore/kcore_dir/kcore
>>
>>>>>> Also the difference between "--script=python:" and just giving the
>>>>>> script name is in the parsing of the arguments following " -- ".
>>>>>> Sometimes they're also parsed as Perf arguments (like the -v becomes
>>>>>> perf verbose and -k becomes the Perf vmlinux rather than the script).
>>>>>>
>>>>>> I _think_ you want the " -- " when "--script" is used, and no "--"
>>>>>> when it's not. But there are some other combinations and you'll have
>>>>>> to debug it to compare your two exact scenarios to see why they're
>>>>>> different.
>>>>>>
>>>>>> But ignoring that issue with the argument format, you mentioned you
>>>>>> didn't see the issue any more with one version of --kcore. So I'm
>>>>>> assuming that confirms the issue is just a decode image issue, so we
>>>>>> shouldn't try to patch this script?
>>>>>>
>>>>>
>>>>> Although one change we should make to the script is change the example
>>>>> to use kcore. We can leave in one vmlinux one as an example if kcore
>>>>> isn't available, but add a note that it will fail if any patched
>>>>> code is
>>>>> traced (which is almost always).
>>>>
>>>> James, you may recall the year old thread
>>>> https://lore.kernel.org/all/ed8cea4c-a261-60ca-f4e1-333ec73cca8f@os.amperecomputing.com.
>>>> I described there an awkward workaround Ampere has used to solve the
>>>> patched code problem. At the time, it sounded like the maintainers were
>>>> interested in getting away from using the python script, mostly for
>>>> speed purposes. I didn't see the discussion go any further.
>>>>
>>>
>>> Oh yes thanks for the reminder. I wasn't thinking about the source
>>> code lines and debug symbols in this thread. I suppose your merging
>>> of kcore and vmlinux gives both the correct image and the symbols,
>>> but I was only focused on the image being correct, so only kcore was
>>> enough.
>>>
>>> It looks like everything we want to do from your previous thread is
>>> in addition to the fixes from this one. Even if we auto merge kcore +
>>> symbols and move the disassembly into Perf, we still want to detect
>>> decode issues earlier and not have IPs jumping backwards. Whether
>>> it's the script or Perf doing that the behavior should be the same.
>>>
>>
>> Since it is breaking the decode, can we please add as a fix to drop
>> the packets from decode when the discontinuity is seen (with warning
>> message in verbose mode)? like below diff?
>>
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -257,6 +257,11 @@ def process_event(param_dict):
>> print("Stop address 0x%x is out of range [ 0x%x ..
>> 0x%x ] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
>> return
>>
>> + if (stop_addr < start_addr):
>> + if (options.verbose == True):
>> + print("Packet Dropped, Discontinuity detected
>> [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
>> dso))
>> + return
>> +
>
> I suppose my only concern with this is that it hides real errors and
> Perf shouldn't be outputting samples that go backwards. Considering that
> fixing this in OpenCSD and Perf has a much wider benefit I think that
> should be the ultimate goal. I'm putting this on my todo list for now
> (including Steve's merging idea).
>
> But in the mean time what about having a force option?
>
> > + if (stop_addr < start_addr):
> > + if (options.verbose == True or not options.force):
> > + print("Packet Dropped, Discontinuity detected
> > [stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr,
> > dso))
> > + if (not options.force):
> > + return
> > +
>
> That way in the future we'll still get reports if something new goes wrong.
Sure, Makes sense.
Is below diff with force option looks good?
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..efe34f308beb 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -36,7 +36,10 @@ option_list = [
help="Set path to objdump executable file"),
make_option("-v", "--verbose", dest="verbose",
action="store_true", default=False,
- help="Enable debugging log")
+ help="Enable debugging log"),
+ make_option("-f", "--force", dest="force",
+ action="store_true", default=False,
+ help="Force decoder to continue")
]
parser = OptionParser(option_list=option_list)
@@ -257,6 +260,12 @@ def process_event(param_dict):
print("Stop address 0x%x is out of range [ 0x%x .. 0x%x
] for dso %s" % (stop_addr, int(dso_start), int(dso_end), dso))
return
+ if (stop_addr < start_addr):
+ if (options.verbose == True or options.force):
+ print("Packet Discontinuity detected
[stop_add:0x%x start_addr:0x%x ] for dso %s" % (stop_addr, start_addr, dso))
+ if (options.force):
+ return
+
if (options.objdump_name != None):
# It doesn't need to decrease virtual memory offset for
disassembly
# for kernel dso and executable file dso, so in this
case we set
>
>> if (options.objdump_name != None):
>> # It doesn't need to decrease virtual memory offset
>> for disassembly
>> # for kernel dso and executable file dso, so in this
>> case we set
>>
>>
>>> To summarise I think these are the changes to make:
>>>
>>> * Improve bad decode detection in OpenCSD
>>> * Get the script to auto merge kcore and vmlinux
>>> * Maybe we could get Perf to do this if both a kcore folder and -k
>>> vmlinux are used?
>>> * Improve the performance, either in the script or move more
>>> functionality into Perf
>>
Thanks,
Ganapat
Powered by blists - more mailing lists