lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 28 Jun 2021 20:12:17 +0800
From:   Leo Yan <leo.yan@...aro.org>
To:     James Clark <james.clark@....com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        John Garry <john.garry@...wei.com>,
        Will Deacon <will@...nel.org>,
        Mathieu Poirier <mathieu.poirier@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Dave Martin <Dave.Martin@....com>, Al Grant <Al.Grant@....com>,
        linux-arm-kernel@...ts.infradead.org,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT
 event

On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote:
> 
> 
> On 19/05/2021 08:19, Leo Yan wrote:
> > When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
> > perf event) for processing trace data, which is needless and even might
> > cause logic error, e.g. it might fail to correlate perf events with Arm
> > SPE events correctly.
> > 
> > So this patch removes the condition checking for PERF_RECORD_EXIT event.
> > 
> > Signed-off-by: Leo Yan <leo.yan@...aro.org>
> > ---
> >  tools/perf/util/arm-spe.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 5c5b438584c4..58b7069c5a5f 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
> >  					sample->time);
> >  		}
> >  	} else if (timestamp) {
> > -		if (event->header.type == PERF_RECORD_EXIT) {
> > -			err = arm_spe_process_queues(spe, timestamp);
> > -			if (err)
> > -				return err;
> > -		}
> > +		err = arm_spe_process_queues(spe, timestamp);
> >  	}
> >  
> >  	return err;
> > 
> 
> For the whole set:
> Reviewed-by: James Clark <james.clark@....com>
> Tested-by: James Clark <james.clark@....com>

> I see a big improvement in decoding involving multiple processes because the timestamps are now
> correlated with the comm and mmap events.
> 
> For example perf-exec samples are visible right before the exec is done, and on an
> application that forks, samples are visible from all processes. For example:
> 
>    perf record -e arm_spe// -- bash -c "stress -c 1"
>    perf script
> 
>    perf-exec  4502 [003] 259755.050409:          1    l1d-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
>    perf-exec  4502 [003] 259755.050409:          1    tlb-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
>    perf-exec  4502 [003] 259755.050409:          1        memory:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
>    perf-exec  4502 [003] 259755.050411:          1    tlb-access:  ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
>    bash  4502 [003] 259755.050411:          1   branch-miss:  ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
>    bash  4502 [003] 259755.050411:          1    tlb-access:                 0 [unknown] ([unknown])
>    ...
>    stress  4502 [003] 259755.051468:          1    l1d-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
>    stress  4502 [003] 259755.051468:          1    tlb-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
>    stress  4502 [003] 259755.051468:          1        memory:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> 
> Previously samples were only attributed to 'stress', which was obviously wrong.

Thanks a lot for the review and testing, James!

Hi Arnaldo, I confirmed this patch set can be cleanly applied on
the latest acme/perf/core branch, so could you pick up this patch
set?

Thanks,
Leo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ