lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180124115143.14322-1-jolsa@kernel.org>
Date:   Wed, 24 Jan 2018 12:51:22 +0100
From:   Jiri Olsa <jolsa@...nel.org>
To:     Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Ingo Molnar <mingo@...nel.org>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        David Ahern <dsahern@...il.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Andy Lutomirski <luto@...capital.net>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function

hi,
this RFC contains change to delay sample's user space
data retrieval into task work, originally described and
discussed by Peter and Ingo in here [1].

This patchset tries to follow the original patch with
some kernel changes (described below) and perf tool
support included.

Basically we allow the NMI event code to skip user data
retrieval and schedule task work to do it, before the
task resumes.

Using the task work limits the window where we can do
this. We can trigger the delayed task work only if the
taskwork gets executed before the process executes again
after NMI, because we need its stack as it was in NMI.

That leaves us with window during the slow syscall path
(check task_struct::perf_user_data_allowed in patches).

The slow syscall processing is forced for task when
the user data event is enabled, which makes the task
slower.

On the other hand I noticed roughly 100us drop in NMI
processing times, which I plotted in here [2].

Not sure it's worth to introduce this processing, which adds
more processing time and does not show much improvement. On
the other hand IIRC Peter mentioned it'd be nice to get user
space data retrieval out of NMI.

Also you guys could think of some other better/faster way ;-)

NOTE I also implemented putting the user stack data into
delayed processing, which showed nicer numbers. But it's
little more tricky and brings more changes into this already
big patchset. The logic stays, so I did not include it to
keep the patchset simple.

Also available in:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/user_data

thanks for comments,
jirka

[1] https://marc.info/?l=linux-kernel&m=150098372819938&w=2
[2] http://people.redhat.com/~jolsa/ud-bench.png

---
Jiri Olsa (21):
      perf tools: Add perf_evsel__is_sample_bit function
      perf tools: Add perf_sample__process function
      perf tools: Add callchain__printf for pure callchain dump
      perf tools: Add perf_sample__copy|free functions
      perf: Add TIF_PERF_USER_DATA bit
      perf: Add PERF_RECORD_USER_DATA event processing
      perf: Add PERF_SAMPLE_USER_DATA_ID sample type
      perf: Add PERF_SAMPLE_CALLCHAIN to user data event
      perf: Export running sample length values through debugfs
      perf tools: Sync perf_event.h uapi header
      perf tools: Add perf_sample__parse function
      perf tools: Add struct parse_args arg to perf_sample__parse
      perf tools: Add support to parse user data event
      perf tools: Add support to dump user data event info
      perf report: Add delayed user data event processing
      perf record: Enable delayed user data events
      perf script: Add support to display user data events
      perf script: Add support to display user data ID
      perf script: Display USER_DATA misc char for sample
      perf report: Add user data processing stats
      perf report: Add --stats=ud option to display user data debug info

 arch/x86/entry/common.c                  |   6 +++
 arch/x86/events/core.c                   |  18 ++++++++
 arch/x86/events/intel/ds.c               |   4 +-
 arch/x86/include/asm/thread_info.h       |   4 +-
 include/linux/init_task.h                |   4 +-
 include/linux/perf_event.h               |   3 ++
 include/linux/sched.h                    |  20 ++++++++
 include/uapi/linux/perf_event.h          |  34 +++++++++++++-
 kernel/events/core.c                     | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/perf_event.h    |  34 +++++++++++++-
 tools/perf/Documentation/perf-script.txt |   3 +-
 tools/perf/builtin-record.c              |   2 +
 tools/perf/builtin-report.c              | 301 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 tools/perf/builtin-script.c              |  98 +++++++++++++++++++++++++++++++++++++++
 tools/perf/perf.h                        |   1 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   9 ++++
 tools/perf/util/evsel.c                  | 118 +++++++++++++++++++++++++++++++++++++----------
 tools/perf/util/evsel.h                  |   5 ++
 tools/perf/util/session.c                |  60 +++++++++++++++++++-----
 tools/perf/util/thread.c                 |   1 +
 tools/perf/util/thread.h                 |  16 +++++++
 tools/perf/util/tool.h                   |   1 +
 23 files changed, 954 insertions(+), 72 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ