[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250327153530.GF604566@e132581.arm.com>
Date: Tue, 1 Apr 2025 22:28:45 +0530
From: Tanmay Jagdale <tanmay@...vell.com>
To: <leo.yan@....com>, Tanmay Jagdale <tanmay@...vell.com>
CC: <suzuki.poulose@....com>, <mike.leach@...aro.org>,
<james.clark@...aro.org>, <john.g.garry@...cle.com>,
<leo.yan@...ux.dev>, <will@...nel.org>, <acme@...nel.org>,
<adrian.hunter@...el.com>, <linux-arm-kernel@...ts.infradead.org>,
<linux-perf-users@...r.kernel.org>, <coresight@...ts.linaro.org>,
<linux-kernel@...r.kernel.org>, <sgoutham@...vell.com>,
<gcherian@...vell.com>
Subject: Re: [PATCH V3 1/2] perf: cs-etm: Fixes in instruction sample synthesis
From: Leo Yan <leo.yan@....com>
Hi Leo,
I was on vacation so could not get back earlier.
>
> > Hi Tanmay,
> >
> > On Thu, Mar 27, 2025 at 04:41:48PM +0530, Tanmay Jagdale wrote:
>>> The existing method to synthesize instruction samples has the
>>> following issues:
>>> 1. Branch instruction mnemonics were being added to non-branch
>>> instructions too.
>>> 2. Branch target address was missing
>>>
>>> To fix the issues, start synthesizing the instructions from the
>>> previous packet (tidq->prev_packet) instead of current packet
>>> (tidq->packet). This way it's easy to figure out the target
>>> address of the branch instruction in tidq->prev_packet which
>>> is the current packet's (tidq->packet) first executed instruction.
>>>
>>> Since we have now switched to processing the previous packet
>>> first, we need not swap the packets during cs_etm__flush().
>>>
>>> Signed-off-by: Tanmay Jagdale <tanmay@...vell.com>
>>> Reviewed-by: James Clark <james.clark@....com>
>>
>> I saw James's reviewed tag. However, I have several comments.
>>
>> Sorry I jumped in too late.
No problem, thanks for the review.
>
>>> ---
>>> tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
>>> 1 file changed, 25 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 0bf9e5c27b59..ebed5b98860e 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -1576,10 +1576,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>
> Seems to me, the problem is cs_etm__synth_instruction_sample() is
> invoked from multiple callers.
>
> Both the previous packet and packet are valid fo the flow:
> cs_etm__sample()
> `> cs_etm__synth_instruction_sample()
>
> Only the previous packet is valid and the current packet stores stale
> data for the flows:
>
> cs_etm__flush()
> `> cs_etm__synth_instruction_sample()
>
> cs_etm__end_block()
> `> cs_etm__synth_instruction_sample()
>
> First, as a prerequisite, I think we should resolve the stale data in
> the packet. So we need a fix like:
Agree.
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 0bf9e5c27b59..b7b17c0e4806 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -741,6 +741,9 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
>
> if (etm->synth_opts.branches || etm->synth_opts.last_branch ||
> etm->synth_opts.instructions) {
> + /* The previous packet will not be used, cleanup it */
> + memset(tidq->prev_packet, 0x0, sizeof(*tidq->packet));
> +
> /*
> * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> * the next incoming packet.
>
Thanks for pointing out, I'll include this fix.
>>> sample.stream_id = etmq->etm->instructions_id;
>>> sample.period = period;
>>> sample.cpu = tidq->packet->cpu;
>
> Should we use "prev_packet->cpu" at here?
>
> Even for a branch instruction, as its IP address is from the previous
> packet, we should use "prev_packet->cpu" for CPU ID as well.
ACK.
>
>>> - sample.flags = tidq->prev_packet->flags;
>>> sample.cpumode = event->sample.header.misc;
>>>
>>> - cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
>>> + cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
>>> +
>>> + /* Populate branch target information only when we encounter
>>> + * branch instruction, which is at the end of tidq->prev_packet.
>>> + */
>>> + if (addr == (tidq->prev_packet->end_addr - 4)) {
>
> if (!addr && addr == cs_etm__last_executed_instr(tidq->prev_packet))
>
>>> + /* Update the perf_sample flags using the prev_packet
>>> + * since that is the queue we are synthesizing.
>>> + */
>>> + sample.flags = tidq->prev_packet->flags;
>>> +
>>> + /* The last instruction of the previous queue would be a
>>> + * branch operation. Get the target of that branch by looking
>>> + * into the first executed instruction of the current packet
>>> + * queue.
>>> + */
>>> + sample.addr = cs_etm__first_executed_instr(tidq->packet);
>
> If connected to the change suggested for cleaning up packet in
> cs_etm__packet_swap(), when run at here, if "tidq->packet" is a valid
> packet, then it will return a branch target address, otherwise, it
> will return 0.
>
>>> + }
>>>
>>> if (etm->synth_opts.last_branch)
>>> sample.branch_stack = tidq->last_branch;
>>> @@ -1771,7 +1787,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>>> /* Get instructions remainder from previous packet */
>>> instrs_prev = tidq->period_instructions;
>>>
>>> - tidq->period_instructions += tidq->packet->instr_count;
>>> + tidq->period_instructions += tidq->prev_packet->instr_count;
>
> A side effect for this change is we will defer to synthesize instruction
> samples for _current_ packet, either the packet will be handled after
> a new packet incoming, or at the end of a trace chunk.
>
> The problem is for the later one, we can see cs_etm__end_block() and
> cs_etm__flush() both only handle the previous packet. As a result, the
> last packet will be ignored.
Yes I agree, this is a side effect of the patch. The last packet's instructions
are not handled.
>
> I would suggest we need to firstly fix this issue in
> cs_etm__end_block() and cs_etm__flush() (maybe we need to consider to
> consolidate the code with cs_etm__sample()).
Okay sure. I will take a look at consolidating the code and post them in
the next version.
>
>>> /*
>>> * Record a branch when the last instruction in
>>> @@ -1851,8 +1867,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>>> * been executed, but PC has not advanced to next
>>> * instruction)
>>> */
>>> + /* Get address from prev_packet since we are synthesizing
>>> + * that in cs_etm__synth_instruction_sample()
>>> + */
>>> addr = cs_etm__instr_addr(etmq, trace_chan_id,
>>> - tidq->packet, offset - 1);
>>> + tidq->prev_packet, offset - 1);
>>> ret = cs_etm__synth_instruction_sample(
>>> etmq, tidq, addr,
>>> etm->instructions_sample_period);
>>> @@ -1916,7 +1935,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>>>
>>> /* Handle start tracing packet */
>>> if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
>>> - goto swap_packet;
>>> + goto reset_last_br;
>>>
>>> if (etmq->etm->synth_opts.last_branch &&
>>> etmq->etm->synth_opts.instructions &&
>>> @@ -1952,8 +1971,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>>> return err;
>>> }
>>>
>>> -swap_packet:
>>> - cs_etm__packet_swap(etm, tidq);
>>> +reset_last_br:
>
> As said, if we consolidate cs_etm__flush() for processing both
> previous packet and current packet, then we don't need to remove
> cs_etm__packet_swap() at here, right?
Yes I think so too.
Thanks,
Tanmay
>
> Thanks,
> Leo
>
Powered by blists - more mailing lists