[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ced9c08c-a017-495f-978b-0c4d13992e5e@linux.ibm.com>
Date: Thu, 10 Apr 2025 20:22:08 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org
Cc: André Almeida <andrealmeid@...lia.com>,
Darren Hart <dvhart@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>,
Waiman Long <longman@...hat.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Ian Rogers <irogers@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Namhyung Kim <namhyung@...nel.org>, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v11 19/19] futex: Allow to make the private hash
immutable.
Hi Sebastian.
On 4/7/25 21:27, Sebastian Andrzej Siewior wrote:
> My initial testing showed that
> perf bench futex hash
>
> reported less operations/sec with private hash. After using the same
> amount of buckets in the private hash as used by the global hash then
> the operations/sec were about the same.
>
> This changed once the private hash became resizable. This feature added
> a RCU section and reference counting via atomic inc+dec operation into
> the hot path.
> The reference counting can be avoided if the private hash is made
> immutable.
> Extend PR_FUTEX_HASH_SET_SLOTS by a fourth argument which denotes if the
> private should be made immutable. Once set (to true) the a further
> resize is not allowed (same if set to global hash).
> Add PR_FUTEX_HASH_GET_IMMUTABLE which returns true if the hash can not
> be changed.
> Update "perf bench" suite.
>
It would be good option for the application to decide if it needs this.
Using this option makes the perf regression goes away using previous number of buckets.
Acked-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
base:
./perf bench futex hash
Averaged 1556023 operations/sec (+- 0.08%), total secs = 10 <<-- 1.5M
with series:
./perf bench futex hash -b32768
Averaged 126499 operations/sec (+- 0.41%), total secs = 10 <<-- .12M
./perf bench futex hash -Ib32768
Averaged 1549339 operations/sec (+- 0.08%), total secs = 10 <<-- 1.5M
> For comparison, results of "perf bench futex hash -s":
> - Xeon CPU E5-2650, 2 NUMA nodes, total 32 CPUs:
> - Before the introducing task local hash
> shared Averaged 1.487.148 operations/sec (+- 0,53%), total secs = 10
> private Averaged 2.192.405 operations/sec (+- 0,07%), total secs = 10
>
> - With the series
> shared Averaged 1.326.342 operations/sec (+- 0,41%), total secs = 10
> -b128 Averaged 141.394 operations/sec (+- 1,15%), total secs = 10
> -Ib128 Averaged 851.490 operations/sec (+- 0,67%), total secs = 10
> -b8192 Averaged 131.321 operations/sec (+- 2,13%), total secs = 10
> -Ib8192 Averaged 1.923.077 operations/sec (+- 0,61%), total secs = 10
> 128 is the default allocation of hash buckets.
> 8192 was the previous amount of allocated hash buckets.
>
> - Xeon(R) CPU E7-8890 v3, 4 NUMA nodes, total 144 CPUs:
> - Before the introducing task local hash
> shared Averaged 1.810.936 operations/sec (+- 0,26%), total secs = 20
> private Averaged 2.505.801 operations/sec (+- 0,05%), total secs = 20
>
> - With the series
> shared Averaged 1.589.002 operations/sec (+- 0,25%), total secs = 20
> -b1024 Averaged 42.410 operations/sec (+- 0,20%), total secs = 20
> -Ib1024 Averaged 740.638 operations/sec (+- 1,51%), total secs = 20
> -b65536 Averaged 48.811 operations/sec (+- 1,35%), total secs = 20
> -Ib65536 Averaged 1.963.165 operations/sec (+- 0,18%), total secs = 20
> 1024 is the default allocation of hash buckets.
> 65536 was the previous amount of allocated hash buckets.
>
> Cc: "Liang, Kan" <kan.liang@...ux.intel.com>
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
> Cc: Ian Rogers <irogers@...gle.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: linux-perf-users@...r.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
> include/linux/futex.h | 2 +-
> include/uapi/linux/prctl.h | 1 +
> kernel/futex/core.c | 42 ++++++++++++++++++++++----
> kernel/sys.c | 2 +-
> tools/include/uapi/linux/prctl.h | 1 +
> tools/perf/bench/futex-hash.c | 1 +
> tools/perf/bench/futex-lock-pi.c | 1 +
> tools/perf/bench/futex-requeue.c | 1 +
> tools/perf/bench/futex-wake-parallel.c | 1 +
> tools/perf/bench/futex-wake.c | 1 +
> tools/perf/bench/futex.c | 8 +++--
> tools/perf/bench/futex.h | 1 +
> 12 files changed, 51 insertions(+), 11 deletions(-)
>
nit: Does it makes sense to split this patch into futex and perf?
Powered by blists - more mailing lists