linux-kernel - RE: [PATCH V3 0/6] event synthesization multithreading for perf record

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <37D7C6CF3E00A74B8858931C1DB2F077537D874E@SHSMSX103.ccr.corp.intel.com>
Date:   Mon, 23 Oct 2017 13:43:39 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Ingo Molnar <mingo@...nel.org>
CC:     "acme@...nel.org" <acme@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "jolsa@...nel.org" <jolsa@...nel.org>,
        "wangnan0@...wei.com" <wangnan0@...wei.com>,
        "hekuang@...wei.com" <hekuang@...wei.com>,
        "namhyung@...nel.org" <namhyung@...nel.org>,
        "alexander.shishkin@...ux.intel.com" 
        <alexander.shishkin@...ux.intel.com>,
        "Hunter, Adrian" <adrian.hunter@...el.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>
Subject: RE: [PATCH V3 0/6] event synthesization multithreading for perf
 record

> * kan.liang@...el.com <kan.liang@...el.com> wrote:
> 
> > The latency is the time cost of __machine__synthesize_threads or its
> > multithreading replacement, record__multithread_synthesize.
> >
> > Original:              original single thread synthesize
> > With patch(not merge): multithread synthesize without final file merge
> >                        (intermediate results for scalability measurement)
> > With patch(merge):     multithread synthesize with file merge
> >                        (final result)
> >
> > - Latency on Knights Mill (272 CPUs)
> >
> > Original(s)	With patch(not merge)(s)	With patch(merge)(s)
> > 12.7		6.6				7.76
> >
> > - Latency on Skylake server (192 CPUs)
> >
> > Original(s)	With patch(not merge)(s)	With patch(merge)(s)
> > 0.34		0.21				0.23
> 
> Ok, I think I mis-understood some aspects of the patch series.
> 
> It multi-threads a certain stage of processing (synthesizing), but not the
> _whole_ process of recording events, right?

Right.

> 
> So I'm wondering, in the context of 'perf record -a' and 'perf top' CPU-
> granular profiling at least (but maybe also in the context of inherited
> workload 'perf record' profiling), could we simply record with per-CPU
> recording threads created early on, which would record into the percpu files
> quite naturally, which would also offer natural multithreading of any
> 'synthesizing' steps later on?
> 
> I.e. instead of multithreading perf record piecemeal wise, why not
> multithread it all - and win big in terms of scalable, low overhead profiling?
>

For 'all', do you mean the whole process?
I think that's the ultimate goal.  Eventually there will be per-CPU recording
threads created at the beginning of perf record and go through the whole process.
The plan is to do the multithreading step by step from the simplest case.
Synthesizing stage is just a start.

Only for synthesizing stage, I think the patch series should already cover all the
'synthesizing' steps which can do multithreading. For the rest 'synthesizing' steps,
it only need to be done by single thread.

Since there is only multithreading for 'synthesizing' step, the threads creation related
code is event.c for now. It's better to move it to a dedicate file and make it generic for
recording threads. I think we can do it later separately. 


Thanks,
Kan