lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 22 Jul 2021 18:15:19 +0200 From: Riccardo Mancini <rickyman7@...il.com> To: Jiri Olsa <jolsa@...hat.com> Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Ian Rogers <irogers@...gle.com>, Namhyung Kim <namhyung@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Mark Rutland <mark.rutland@....com>, linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org Subject: Re: [RFC PATCH 00/10] perf: add workqueue library and use it in synthetic-events Hi Jiri, On Mon, 2021-07-19 at 23:13 +0200, Jiri Olsa wrote: > On Tue, Jul 13, 2021 at 02:11:11PM +0200, Riccardo Mancini wrote: > > This patchset introduces a new utility library inside perf/util, which > > provides a work queue abstraction, which loosely follows the Kernel > > workqueue API. > > > > The workqueue abstraction is made up by two components: > > - threadpool: which takes care of managing a pool of threads. It is > > inspired by the prototype for threaded trace in perf-record from Alexey: > > > > https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/ > > - workqueue: manages a shared queue and provides the workers > > implementation. > > > > On top of the workqueue, a simple parallel-for utility is implemented > > which is then showcased in synthetic-events.c, replacing the previous > > manual pthread-created threads. > > > > Through some experiments with perf bench, I can see how the new > > workqueue has a higher overhead compared to manual creation of threads, > > but is able to more effectively partition work among threads, yielding > > a better result with more threads. > > Furthermore, the overhead could be configured by changing the > > `work_size` (currently 1), aka the number of dirents that are > > processed by a thread before grabbing a lock to get the new work item. > > I experimented with different sizes but, while bigger sizes reduce overhead > > as expected, they do not scale as well to more threads. > > > > I tried to keep the patchset as simple as possible, deferring possible > > improvements and features to future work. > > Naming a few: > > - in order to achieve a better performance, we could consider using > > work-stealing instead of a common queue. > > - affinities in the thread pool, as in Alexey prototype for > > perf-record. Doing so would enable reusing the same threadpool for > > different purposes (evlist open, threaded trace, synthetic threads), > > avoiding having to spin up threads multiple times. > > - resizable threadpool, e.g. for lazy spawining of threads. > > > > @Arnaldo > > Since I wanted the workqueue to provide a similar API to the Kernel's > > workqueue, I followed the naming style I found there, instead of the > > usual object__method style that is typically found in perf. > > Let me know if you'd like me to follow perf style instead. > > > > Thanks, > > Riccardo > > > > Riccardo Mancini (10): > > perf workqueue: threadpool creation and destruction > > perf tests: add test for workqueue > > perf workqueue: add threadpool start and stop functions > > perf workqueue: add threadpool execute and wait functions > > perf workqueue: add sparse annotation header > > perf workqueue: introduce workqueue struct > > perf workqueue: implement worker thread and management > > perf workqueue: add queue_work and flush_workqueue functions > > perf workqueue: add utility to execute a for loop in parallel > > perf synthetic-events: use workqueue parallel_for > > looks great, would it make sense to put this to libperf? I don't know about libperf in particular. The idea is to start using it in perf and, if everything goes well, to put it in lib/ so that everyone interested in it could just include it. Since I'm looking for other parts where a workqueue could be useful, if you know of some in libperf, I could try having a look at them too. Riccardo > > jirka > > > > > tools/perf/tests/Build | 1 + > > tools/perf/tests/builtin-test.c | 9 + > > tools/perf/tests/tests.h | 3 + > > tools/perf/tests/workqueue.c | 453 +++++++++++++++++ > > tools/perf/util/Build | 1 + > > tools/perf/util/synthetic-events.c | 131 +++-- > > tools/perf/util/workqueue/Build | 2 + > > tools/perf/util/workqueue/sparse.h | 21 + > > tools/perf/util/workqueue/threadpool.c | 516 ++++++++++++++++++++ > > tools/perf/util/workqueue/threadpool.h | 29 ++ > > tools/perf/util/workqueue/workqueue.c | 642 +++++++++++++++++++++++++ > > tools/perf/util/workqueue/workqueue.h | 38 ++ > > 12 files changed, 1771 insertions(+), 75 deletions(-) > > create mode 100644 tools/perf/tests/workqueue.c > > create mode 100644 tools/perf/util/workqueue/Build > > create mode 100644 tools/perf/util/workqueue/sparse.h > > create mode 100644 tools/perf/util/workqueue/threadpool.c > > create mode 100644 tools/perf/util/workqueue/threadpool.h > > create mode 100644 tools/perf/util/workqueue/workqueue.c > > create mode 100644 tools/perf/util/workqueue/workqueue.h > > > > -- > > 2.31.1 > > >
Powered by blists - more mailing lists