lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Oct 2020 14:40:32 +0900
From:   namhyung@...nel.org
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Ian Rogers <irogers@...gle.com>,
        Al Grant <al.grant@...s.arm.com>,
        Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: [PATCH] perf inject: Flush ordered events on FINISHED_ROUND

On Tue, Oct 06, 2020 at 11:39:49AM +0900, namhyung@...nel.org wrote:
> > > On Fri, Oct 02, 2020 at 10:03:17PM +0900, Namhyung Kim wrote:
> > > > Below measures time and memory usage during the perf inject and
> > > > report using ~190MB data file.
> > > >
> > > > Before:
> > > >   perf inject:  11.09 s,  382148 KB
> > > >   perf report:   8.05 s,  397440 KB
> > > >
> > > > After:
> > > >   perf inject:  16.24 s,   83376 KB
> > > >   perf report:   7.96 s,  216184 KB
> > > >
> > > > As you can see, it used 2x memory of the input size.  I guess it's
> > > > because it needs to keep the copy for the whole input.  But I don't
> > > > understand why processing time of perf inject increased..
> 
> Measuring it with time shows:
> 
>            before       after
>   real    11.309s     17.040s
>   user     8.084s     13.940s
>   sys      6.535s      6.732s
> 
> So it's user space to make the difference.  I've run perf record on
> both (with cycles:U) and the dominant function is same: queue_event.
> (46.98% vs 65.87%)
> 
> It seems the flushing the queue makes more overhead on sorting.

So I suspect the cache-miss ratio affects the performance.  With
flushing, data is processed in the middle and all the entries are
reused after flush so it would invalidate all the cache lines
occasionally.

This is the perf stat result:

* Before

     7,167,414,019      L1-dcache-loads                                             
       337,471,761      L1-dcache-read-misses     #    4.71% of all L1-dcache hits  

      11.011224671 seconds time elapsed


* After

     7,075,556,792      L1-dcache-loads                                             
       771,810,388      L1-dcache-read-misses     #   10.91% of all L1-dcache hits  

      17.015901863 seconds time elapsed


Hmm.. it's a memory & time trade-off then.  Maybe we need a switch to
select which one?

Thanks
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ