lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171023114822.ijbixdkhysinlwqv@gmail.com>
Date:   Mon, 23 Oct 2017 13:48:22 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     kan.liang@...el.com
Cc:     acme@...nel.org, mingo@...hat.com, linux-kernel@...r.kernel.org,
        peterz@...radead.org, jolsa@...nel.org, wangnan0@...wei.com,
        hekuang@...wei.com, namhyung@...nel.org,
        alexander.shishkin@...ux.intel.com, adrian.hunter@...el.com,
        ak@...ux.intel.com
Subject: Re: [PATCH V3 0/6] event synthesization multithreading for perf
 record


* kan.liang@...el.com <kan.liang@...el.com> wrote:

> The latency is the time cost of __machine__synthesize_threads or
> its multithreading replacement, record__multithread_synthesize.
> 
> Original:              original single thread synthesize
> With patch(not merge): multithread synthesize without final file merge
>                        (intermediate results for scalability measurement)
> With patch(merge):     multithread synthesize with file merge
>                        (final result)
> 
> - Latency on Knights Mill (272 CPUs)
> 
> Original(s)	With patch(not merge)(s)	With patch(merge)(s)
> 12.7		6.6				7.76
> 
> - Latency on Skylake server (192 CPUs)
> 
> Original(s)	With patch(not merge)(s)	With patch(merge)(s)
> 0.34		0.21				0.23

Ok, I think I mis-understood some aspects of the patch series.

It multi-threads a certain stage of processing (synthesizing), but not the _whole_ 
process of recording events, right?

So I'm wondering, in the context of 'perf record -a' and 'perf top' CPU-granular 
profiling at least (but maybe also in the context of inherited workload 'perf 
record' profiling), could we simply record with per-CPU recording threads created 
early on, which would record into the percpu files quite naturally, which would 
also offer natural multithreading of any 'synthesizing' steps later on?

I.e. instead of multithreading perf record piecemeal wise, why not multithread it 
all - and win big in terms of scalable, low overhead profiling?

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ