lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <cover.1627643744.git.rickyman7@gmail.com>
Date:   Fri, 30 Jul 2021 17:34:07 +0200
From:   Riccardo Mancini <rickyman7@...il.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Ian Rogers <irogers@...gle.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Jiri Olsa <jolsa@...hat.com>, linux-kernel@...r.kernel.org,
        linux-perf-users@...r.kernel.org,
        Alexey Bayduraev <alexey.v.bayduraev@...ux.intel.com>,
        Riccardo Mancini <rickyman7@...il.com>
Subject: [RFC PATCH v2 00/10] perf: add workqueue library and use it in synthetic-events

Changes in v2:
 - rename threadpool_struct and its functions to adhere to naming style
 - use ERR_PTR instead of returning NULL
 - add *__strerror functions, removing pr_err from library code
 - wait for threads after creation of all threads, instead of waiting
   after each creation
 - use intention-revealing macros in test code instead of 0 and -1
 - use readn/writen functions

v1: https://lkml.kernel.org/lkml/cover.1626177381.git.rickyman7@gmail.com/

This patchset introduces a new utility library inside perf/util, which
provides a work queue abstraction, which loosely follows the Kernel
workqueue API.

The workqueue abstraction is made up by two components:
 - threadpool: which takes care of managing a pool of threads. It is
   inspired by the prototype for threaded trace in perf-record from Alexey:
   https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/
 - workqueue: manages a shared queue and provides the workers implementation.

On top of the workqueue, a simple parallel-for utility is implemented
which is then showcased in synthetic-events.c, replacing the previous
manual pthread-created threads.

Through some experiments with perf bench, I can see how the new 
workqueue has a slightly higher overhead compared to manual creation of 
threads, but is able to more effectively partition work among threads, 
yielding better results overall.
Furthermore, the overhead could be reduced by changing the
`work_size` (currently 1), aka the number of dirents that are
processed by a thread before grabbing a lock to get the new work item.
I experimented with different sizes but, while bigger sizes reduce overhead
as expected, they do not scale as well to more threads.

I believe the next steps are, in order:
 - add support to specifying affinities to threads.
 - add worker queues with round robin assignment (replacing the current
   shared queue).
 - optionally add work stealing among the worker queues. I'd keep it a
   future work for the moment since we do not have a specific use case
   for it at the moment (the synthetic threads would not benefit too
   much from it imo).
 - add support to executing jobs on a specific worker (which will have
   its own affinity). This is useful for Alexey's usecase and for the
   following evlist open usecase.
 - make workqueue directly manage the threadpool (as Arnaldo already 
   suggested). We could also make the workqueue global but that requires
   an autogrow feature so that threads are created on demand and not all
   at the beginning. I'd keep it a future work.
 - apply workqueue to the evlist open (the idea is to pin each thread to
   a GPU and submit a work_struct for each evsel/cpu to the matching cpu).

Any comment or idea is highly appreciated.

Thanks,
Riccardo

Riccardo Mancini (10):
  perf workqueue: threadpool creation and destruction
  perf tests: add test for workqueue
  perf workqueue: add threadpool start and stop functions
  perf workqueue: add threadpool execute and wait functions
  tools: add sparse context/locking annotations in compiler-types.h
  perf workqueue: introduce workqueue struct
  perf workqueue: implement worker thread and management
  perf workqueue: add queue_work and flush_workqueue functions
  perf workqueue: add utility to execute a for loop in parallel
  perf synthetic-events: use workqueue parallel_for

 tools/include/linux/compiler_types.h   |  18 +
 tools/perf/tests/Build                 |   1 +
 tools/perf/tests/builtin-test.c        |   9 +
 tools/perf/tests/tests.h               |   3 +
 tools/perf/tests/workqueue.c           | 450 +++++++++++++++
 tools/perf/util/Build                  |   1 +
 tools/perf/util/synthetic-events.c     | 155 ++---
 tools/perf/util/workqueue/Build        |   2 +
 tools/perf/util/workqueue/threadpool.c | 674 ++++++++++++++++++++++
 tools/perf/util/workqueue/threadpool.h |  40 ++
 tools/perf/util/workqueue/workqueue.c  | 750 +++++++++++++++++++++++++
 tools/perf/util/workqueue/workqueue.h  |  51 ++
 12 files changed, 2080 insertions(+), 74 deletions(-)
 create mode 100644 tools/perf/tests/workqueue.c
 create mode 100644 tools/perf/util/workqueue/Build
 create mode 100644 tools/perf/util/workqueue/threadpool.c
 create mode 100644 tools/perf/util/workqueue/threadpool.h
 create mode 100644 tools/perf/util/workqueue/workqueue.c
 create mode 100644 tools/perf/util/workqueue/workqueue.h

-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ