[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1626177381.git.rickyman7@gmail.com>
Date: Tue, 13 Jul 2021 14:11:11 +0200
From: Riccardo Mancini <rickyman7@...il.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Jiri Olsa <jolsa@...hat.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org,
Riccardo Mancini <rickyman7@...il.com>
Subject: [RFC PATCH 00/10] perf: add workqueue library and use it in synthetic-events
This patchset introduces a new utility library inside perf/util, which
provides a work queue abstraction, which loosely follows the Kernel
workqueue API.
The workqueue abstraction is made up by two components:
- threadpool: which takes care of managing a pool of threads. It is
inspired by the prototype for threaded trace in perf-record from Alexey:
https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/
- workqueue: manages a shared queue and provides the workers implementation.
On top of the workqueue, a simple parallel-for utility is implemented
which is then showcased in synthetic-events.c, replacing the previous
manual pthread-created threads.
Through some experiments with perf bench, I can see how the new
workqueue has a higher overhead compared to manual creation of threads,
but is able to more effectively partition work among threads, yielding
a better result with more threads.
Furthermore, the overhead could be configured by changing the
`work_size` (currently 1), aka the number of dirents that are
processed by a thread before grabbing a lock to get the new work item.
I experimented with different sizes but, while bigger sizes reduce overhead
as expected, they do not scale as well to more threads.
I tried to keep the patchset as simple as possible, deferring possible
improvements and features to future work.
Naming a few:
- in order to achieve a better performance, we could consider using
work-stealing instead of a common queue.
- affinities in the thread pool, as in Alexey prototype for
perf-record. Doing so would enable reusing the same threadpool for
different purposes (evlist open, threaded trace, synthetic threads),
avoiding having to spin up threads multiple times.
- resizable threadpool, e.g. for lazy spawining of threads.
@Arnaldo
Since I wanted the workqueue to provide a similar API to the Kernel's
workqueue, I followed the naming style I found there, instead of the
usual object__method style that is typically found in perf.
Let me know if you'd like me to follow perf style instead.
Thanks,
Riccardo
Riccardo Mancini (10):
perf workqueue: threadpool creation and destruction
perf tests: add test for workqueue
perf workqueue: add threadpool start and stop functions
perf workqueue: add threadpool execute and wait functions
perf workqueue: add sparse annotation header
perf workqueue: introduce workqueue struct
perf workqueue: implement worker thread and management
perf workqueue: add queue_work and flush_workqueue functions
perf workqueue: add utility to execute a for loop in parallel
perf synthetic-events: use workqueue parallel_for
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 9 +
tools/perf/tests/tests.h | 3 +
tools/perf/tests/workqueue.c | 453 +++++++++++++++++
tools/perf/util/Build | 1 +
tools/perf/util/synthetic-events.c | 131 +++--
tools/perf/util/workqueue/Build | 2 +
tools/perf/util/workqueue/sparse.h | 21 +
tools/perf/util/workqueue/threadpool.c | 516 ++++++++++++++++++++
tools/perf/util/workqueue/threadpool.h | 29 ++
tools/perf/util/workqueue/workqueue.c | 642 +++++++++++++++++++++++++
tools/perf/util/workqueue/workqueue.h | 38 ++
12 files changed, 1771 insertions(+), 75 deletions(-)
create mode 100644 tools/perf/tests/workqueue.c
create mode 100644 tools/perf/util/workqueue/Build
create mode 100644 tools/perf/util/workqueue/sparse.h
create mode 100644 tools/perf/util/workqueue/threadpool.c
create mode 100644 tools/perf/util/workqueue/threadpool.h
create mode 100644 tools/perf/util/workqueue/workqueue.c
create mode 100644 tools/perf/util/workqueue/workqueue.h
--
2.31.1
Powered by blists - more mailing lists