lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1508529934-369393-1-git-send-email-kan.liang@intel.com>
Date:   Fri, 20 Oct 2017 13:05:28 -0700
From:   kan.liang@...el.com
To:     acme@...nel.org, mingo@...hat.com, linux-kernel@...r.kernel.org
Cc:     peterz@...radead.org, jolsa@...nel.org, wangnan0@...wei.com,
        hekuang@...wei.com, namhyung@...nel.org,
        alexander.shishkin@...ux.intel.com, adrian.hunter@...el.com,
        ak@...ux.intel.com, Kan Liang <Kan.liang@...el.com>
Subject: [PATCH V3 0/6] event synthesization multithreading for perf record

From: Kan Liang <Kan.liang@...el.com>

The event synthesization multithreading is introduced in
("perf top optimization") https://lkml.org/lkml/2017/9/29/269
But it was not enabled for perf record. Because the process function
process_synthesized_event was not multithreading friendly.

The patch series temporarily stores the process result in per-thread file,
which make the processing in parallel. Then it dumps the file one by one to
the perf.data at the end of event synthesization.

The source code is also available at
https://github.com/kliang2/perf.git perf_record_opt

Usually, the event synthesization only happens once on either start or end.
With the snapshotting code, we synthesize events multiple times, once per
each new perf.data file. Both of the cases are verified.

Here are the latency test result on Knights Mill and Skylake server

The workload is to compile Linux kernel as below
"sudo nice make -j$(grep -c '^processor' /proc/cpuinfo)"
Then, "sudo perf record -e cycles -a -- sleep 1"

The latency is the time cost of __machine__synthesize_threads or
its multithreading replacement, record__multithread_synthesize.

Original:              original single thread synthesize
With patch(not merge): multithread synthesize without final file merge
                       (intermediate results for scalability measurement)
With patch(merge):     multithread synthesize with file merge
                       (final result)

- Latency on Knights Mill (272 CPUs)

Original(s)	With patch(not merge)(s)	With patch(merge)(s)
12.7		6.6				7.76

- Latency on Skylake server (192 CPUs)

Original(s)	With patch(not merge)(s)	With patch(merge)(s)
0.34		0.21				0.23

Changes since V2:
 - Introduce a new interface to automatically generate tmp file.(Patch 4/6)
   Remove the tmp file when it close. (jirka)
 - Move all checks to record__multithread_synthesize (jirka)
 - Minor changes for record__multithread_synthesize
 - Update test data (Ingo)

Changes since V1:
 - Dump the synthesized result to per-thread file and merge them to perf.data
   at the end. (Arnaldo)

Kan Liang (6):
  perf tools: pass thread info to process function
  perf tools: pass thread info in event synthesization
  perf tools: expose copyfile_offset()
  perf tools: add perf_data_file__open_tmp
  perf record: synthesize event multithreading support
  perf record: add option to set the number of thread for event
    synthesize

 tools/perf/Documentation/perf-record.txt |   4 ++
 tools/perf/arch/x86/util/tsc.c           |   2 +-
 tools/perf/builtin-inject.c              |  12 +++-
 tools/perf/builtin-record.c              | 114 ++++++++++++++++++++++++++++---
 tools/perf/builtin-sched.c               |  12 ++--
 tools/perf/builtin-stat.c                |   3 +-
 tools/perf/builtin-trace.c               |   3 +-
 tools/perf/tests/cpumap.c                |   6 +-
 tools/perf/tests/dwarf-unwind.c          |   6 +-
 tools/perf/tests/event_update.c          |  12 ++--
 tools/perf/tests/stat.c                  |   9 ++-
 tools/perf/tests/thread-map.c            |   3 +-
 tools/perf/util/auxtrace.c               |   2 +-
 tools/perf/util/data.c                   |  26 +++++++
 tools/perf/util/data.h                   |   2 +
 tools/perf/util/event.c                  | 111 ++++++++++++++++++------------
 tools/perf/util/event.h                  |  19 ++++--
 tools/perf/util/header.c                 |  16 ++---
 tools/perf/util/intel-bts.c              |   3 +-
 tools/perf/util/intel-pt.c               |   3 +-
 tools/perf/util/session.c                |   4 +-
 tools/perf/util/util.c                   |   2 +-
 tools/perf/util/util.h                   |   2 +
 23 files changed, 284 insertions(+), 92 deletions(-)

-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ