[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201008090654.GA656950@krava>
Date: Thu, 8 Oct 2020 11:06:54 +0200
From: Jiri Olsa <jolsa@...hat.com>
To: namhyung@...nel.org
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
Stephane Eranian <eranian@...gle.com>,
Ian Rogers <irogers@...gle.com>,
Al Grant <al.grant@...s.arm.com>,
Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: [PATCH] perf inject: Flush ordered events on FINISHED_ROUND
On Tue, Oct 06, 2020 at 02:40:32PM +0900, namhyung@...nel.org wrote:
> On Tue, Oct 06, 2020 at 11:39:49AM +0900, namhyung@...nel.org wrote:
> > > > On Fri, Oct 02, 2020 at 10:03:17PM +0900, Namhyung Kim wrote:
> > > > > Below measures time and memory usage during the perf inject and
> > > > > report using ~190MB data file.
> > > > >
> > > > > Before:
> > > > > perf inject: 11.09 s, 382148 KB
> > > > > perf report: 8.05 s, 397440 KB
> > > > >
> > > > > After:
> > > > > perf inject: 16.24 s, 83376 KB
> > > > > perf report: 7.96 s, 216184 KB
> > > > >
> > > > > As you can see, it used 2x memory of the input size. I guess it's
> > > > > because it needs to keep the copy for the whole input. But I don't
> > > > > understand why processing time of perf inject increased..
> >
> > Measuring it with time shows:
> >
> > before after
> > real 11.309s 17.040s
> > user 8.084s 13.940s
> > sys 6.535s 6.732s
> >
> > So it's user space to make the difference. I've run perf record on
> > both (with cycles:U) and the dominant function is same: queue_event.
> > (46.98% vs 65.87%)
> >
> > It seems the flushing the queue makes more overhead on sorting.
>
> So I suspect the cache-miss ratio affects the performance. With
> flushing, data is processed in the middle and all the entries are
> reused after flush so it would invalidate all the cache lines
> occasionally.
>
> This is the perf stat result:
>
> * Before
>
> 7,167,414,019 L1-dcache-loads
> 337,471,761 L1-dcache-read-misses # 4.71% of all L1-dcache hits
>
> 11.011224671 seconds time elapsed
>
>
> * After
>
> 7,075,556,792 L1-dcache-loads
> 771,810,388 L1-dcache-read-misses # 10.91% of all L1-dcache hits
>
> 17.015901863 seconds time elapsed
>
>
> Hmm.. it's a memory & time trade-off then. Maybe we need a switch to
> select which one?
I'd keep the faster one ;-) so the one before
jirka
Powered by blists - more mailing lists