lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Nov 2019 09:04:17 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Jiri Olsa <jolsa@...hat.com>, Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: [PATCH v2 0/3] perf record: adapt NUMA awareness to machines with
 #CPUs > 1K


Current implementation of cpu_set_t type by glibc has internal cpu
mask size limitation of no more than 1024 CPUs. This limitation confines
NUMA awareness of Perf tool in record mode, thru --affinity option,
to the first 1024 CPUs on machines with larger amount of CPUs.

This patch set enables Perf tool to overcome 1024 CPUs limitation by
using a dedicated struct mmap_cpu_mask type and applying tool's bitmap
API operations to manipulate affinity masks of the tool's thread and
the mmaped data buffers.

tools bitmap API has been extended with bitmap_free() function and
bitmap_equal() operation whose implementation is derived from the
kernel one.

---
Alexey Budankov (3):
  tools bitmap: implement bitmap_equal() operation at bitmap API
  perf mmap: declare type for cpu mask of arbitrary length
  perf record: adapt affinity to machines with #CPUs > 1K

 tools/include/linux/bitmap.h | 30 ++++++++++++++++++++++++++++++
 tools/lib/bitmap.c           | 15 +++++++++++++++
 tools/perf/builtin-record.c  | 30 ++++++++++++++++++++++++------
 tools/perf/util/mmap.c       | 31 +++++++++++++++++++++++++------
 tools/perf/util/mmap.h       | 11 ++++++++++-
 5 files changed, 104 insertions(+), 13 deletions(-)

---
Changes in v2:
- implemented bitmap_free() for symmetry with bitmap_alloc()
- capitalized MMAP_CPU_MASK_BYTES() macro
- returned -1 from perf_mmap__setup_affinity_mask()
- implemented releasing of masks using bitmap_free()
- moved debug printing under -vv option

---
Testing:

  tools/perf/perf record -vv --affinity=cpu -- ls
  thread mask[8]: empty
  Using CPUID GenuineIntel-6-5E-3
  intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
  nr_cblocks: 0
  affinity: CPU
  mmap flush: 1
  comp level: 0
  ------------------------------------------------------------
  perf_event_attr:
    size                             112
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD
    read_format                      ID
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid 28649  cpu 0  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid 28649  cpu 1  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid 28649  cpu 2  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid 28649  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid 28649  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid 28649  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid 28649  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid 28649  cpu 7  group_fd -1  flags 0x8 = 13
  mmap size 528384B
  0x7f1898200010: mmap mask[8]: 0
  0x7f18982100d8: mmap mask[8]: 1
  0x7f18982201a0: mmap mask[8]: 2
  0x7f1898230268: mmap mask[8]: 3
  0x7f1898240330: mmap mask[8]: 4
  0x7f18982503f8: mmap mask[8]: 5
  0x7f18982604c0: mmap mask[8]: 6
  0x7f1898270588: mmap mask[8]: 7
  ------------------------------------------------------------
  perf_event_attr:
    type                             1
    size                             112
    config                           0x9
    watermark                        1
    sample_id_all                    1
    bpf_event                        1
    { wakeup_events, wakeup_watermark } 1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
  ...
  Synthesizing TSC conversion information
  thread mask[8]: 0
  thread mask[8]: 1
  thread mask[8]: 2
  thread mask[8]: 3
  thread mask[8]: 4
  arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
  block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
  certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
  config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
  thread mask[8]: 5
  thread mask[8]: 6
  thread mask[8]: 7
  thread mask[8]: 0
  thread mask[8]: 1
  thread mask[8]: 2
  thread mask[8]: 3
  thread mask[8]: 4
  thread mask[8]: 5
  thread mask[8]: 6
  thread mask[8]: 7
  [ perf record: Woken up 0 times to write data ]
  thread mask[8]: 0
  thread mask[8]: 1
  thread mask[8]: 2
  thread mask[8]: 3
  thread mask[8]: 4
  thread mask[8]: 5
  thread mask[8]: 6
  thread mask[8]: 7
  Looking at the vmlinux_path (8 entries long)
  Using vmlinux for symbols
  [ perf record: Captured and wrote 0.014 MB perf.data (8 samples) ]

-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ