[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <185c8c12-4d6d-41a2-bb04-dfe1d00d01c4@paulmck-laptop>
Date: Thu, 31 Jul 2025 16:38:52 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Yuzhuo Jing <yuzhuo@...gle.com>
Cc: Ian Rogers <irogers@...gle.com>, Yuzhuo Jing <yzj@...ch.edu>,
Jonathan Corbet <corbet@....net>,
Davidlohr Bueso <dave@...olabs.net>,
Josh Triplett <josh@...htriplett.org>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Arnd Bergmann <arnd@...db.de>,
Frank van der Linden <fvdl@...gle.com>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU
affinity offset
On Tue, Jul 29, 2025 at 07:23:43PM -0700, Yuzhuo Jing wrote:
> In an effort to add RCU benchmarks to the perf tool and to improve
> the base-metal rcuscale tests, this patch series adds several auxiliary
> features useful for testing tools.
>
> This series introduces a few rcuscale options:
> * writer_no_print: skip writer duration printing during shutdown, but
> instead let users read from the new "writer_durations" debugfs file.
> This drastically improves cleanup speed.
But existing scripts running something like this will continue to
work, correct? (It looks like they do, just checking.)
tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 5
Don't get me wrong, your debugfs read-out performance increase looks
quite good, but these tests run in a guest OS with minimal userspace.
And by "minimal", I mean that they run out of an initrd having a root
filesystem consisting of a single statically linked "init" program. ;-)
> * block_start: an option to hold all worker thread until the new
> debugfs "should_start" file is written.
> * {reader,writer,kfree}_cpu_offset: the starting value of CPU affinity
> for each type of threads. This can be used to avoid scheduling
> different types of threads on the same CPU. The 4th patch in this
> series shows drastic performance differences w/ and w/o overlaps.
The usual use cases run only writers except for stress tests, but this
seems like a good capability.
> This patch series creates an "rcuscale" folder in debugfs, containing
> the following files:
> * writer_durations: a CSV formatted file containing writer id and
> writer durations.
> * {reader,writer,kfree}_tasks: the list of kernel task PIDs for
> external tools to attach to.
> * should_start: a writable file to signal the start of the experiment,
> used in conjunction with the new "block_start" option.
> * test_complete: a readable file to indicate whether the experiment has
> finished or not.
>
> RFCs:
> * Should those new files reside in debugfs or in procfs?
New files in procfs face serious scrutiny, so your choice of debugfs
is a good one.
> * What format should be used for the writer_duartions file, and what
> documentations should be updated for the file format definition?
Back in the old days, I would have insisted on space/tab separated fields.
But gawk now supports a --csv flag, so I don't feel strongly about this.
> * In the 4th patch, we see different characteristics between overlap
> and non-overlap. Current rcuscale creates nr_cpu readers and nr_cpu
> writers, thus scheduling 2nr_cpu tasks on nr_cpu CPUs. Should we
> consider changes to this behavior? Or add automatic conflict
> resolutions when total threads <= nr_cpu.
The theory back in the day was that the updater would spend enough time
blocked that this would not matter. However, you have shown that it
clearly does matter.
Except that running the reader and writer on the same CPU seems to
*improve* grace-period latency, with P99 value duration of 121,004
microseconds for overlapping (your first patch 4/4 experiment) and of
218,018 microseconds for non-overlapping. Since shorter grace periods
are usually considered better, this suggests better performance with
the reader and writer running on the same thread.
Or am I misreading your commit log?
It would not be too surprising for the overlapping case to provide
faster grace periods because you are running PREEMPT=n and the writer
kthread would force context switches more frequently. But I figured
that I should check.
> Thank you!
>
> Yuzhuo Jing (4):
> rcuscale: Create debugfs file for writer durations
> rcuscale: Create debugfs files for worker thread PIDs
> rcuscale: Add file based start/finish control
This does not apply on the dev branch of my -rcu tree. Which is not too
surprising because kernel-parameters.txt is subject to change. But when
you repost to fix the bug that kernel test robot detected, could you
please let me know what mainline version you are developing against?
That would allow me to apply it there and then to rebase and resolve
conflicts as needed.
Thanx, Paul
> rcuscale: Add CPU affinity offset options
>
> .../admin-guide/kernel-parameters.txt | 29 ++
> kernel/rcu/rcuscale.c | 361 +++++++++++++++++-
> 2 files changed, 377 insertions(+), 13 deletions(-)
>
> --
> 2.50.1.552.g942d659e1b-goog
>
Powered by blists - more mailing lists