[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+icZUXN-6TRq1CO3O5i+0WAs91mk8iM-kASgPCjMzVv9yragA@mail.gmail.com>
Date: Tue, 17 Jan 2023 07:54:46 +0100
From: Sedat Dilek <sedat.dilek@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Al Viro <viro@...iv.linux.org.uk>,
Matthew Wilcox <willy@...radead.org>,
maobibo <maobibo@...ngson.cn>,
Hongchen Zhang <zhanghongchen@...ngson.cn>,
David Howells <dhowells@...hat.com>,
"Christian Brauner (Microsoft)" <brauner@...nel.org>,
Luis Chamberlain <mcgrof@...nel.org>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
"Fabio M. De Francesco" <fmdefrancesco@...il.com>,
Christophe JAILLET <christophe.jaillet@...adoo.fr>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock
On Mon, Jan 16, 2023 at 11:16 PM Andrew Morton
<akpm@...ux-foundation.org> wrote:
>
> On Mon, 16 Jan 2023 21:10:37 +0000 Al Viro <viro@...iv.linux.org.uk> wrote:
>
> > On Mon, Jan 16, 2023 at 04:38:01AM +0000, Matthew Wilcox wrote:
> > > On Mon, Jan 16, 2023 at 11:16:13AM +0800, maobibo wrote:
> > > > Hongchen,
> > > >
> > > > I have a glance with this patch, it simply replaces with
> > > > spinlock_irqsave with mutex lock. There may be performance
> > > > improvement with two processes competing with pipe, however
> > > > for N processes, there will be complex context switches
> > > > and ipi interruptts.
> > > >
> > > > Can you find some cases with more than 2 processes competing
> > > > pipe, rather than only unixbench?
> > >
> > > What real applications have pipes with more than 1 writer & 1 reader?
> > > I'm OK with slowing down the weird cases if the common cases go faster.
> >
> > >From commit 0ddad21d3e99c743a3aa473121dc5561679e26bb:
> > While this isn't a common occurrence in the traditional "use a pipe as a
> > data transport" case, where you typically only have a single reader and
> > a single writer process, there is one common special case: using a pipe
> > as a source of "locking tokens" rather than for data communication.
> >
> > In particular, the GNU make jobserver code ends up using a pipe as a way
> > to limit parallelism, where each job consumes a token by reading a byte
> > from the jobserver pipe, and releases the token by writing a byte back
> > to the pipe.
>
> The author has tested this patch with Linus's test code from 0ddad21d3e
> and the results were OK
> (https://lkml.kernel.org/r/c3cbede6-f19e-3333-ba0f-d3f005e5d599@loongson.cn).
>
Yesterday, I had some time to play with and without this patch on my
Debian/unstable AMD64 box.
[ TEST-CASE ]
BASE: Linux v6.2-rc4
PATCH: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock
TEST-CASE: Taken from commit 0ddad21d3e99
RUN: gcc-12 -o 0ddad21d3e99 0ddad21d3e99.c
Link: https://lore.kernel.org/all/20230107012324.30698-1-zhanghongchen@loongson.cn/
Link: https://git.kernel.org/linus/0ddad21d3e99
[ INSTRUCTIONS ]
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid
10 runs: /usr/bin/perf stat --repeat=10 ./0ddad21d3e99
echo 1 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid
[ BEFORE ]
Performance counter stats for './0ddad21d3e99' (10 runs):
23.985,50 msec task-clock # 3,246
CPUs utilized ( +- 0,20% )
1.112.822 context-switches # 46,696
K/sec ( +- 0,30% )
403.033 cpu-migrations # 16,912
K/sec ( +- 0,28% )
1.508 page-faults # 63,278
/sec ( +- 2,95% )
39.436.000.959 cycles # 1,655 GHz
( +- 0,22% )
29.364.329.413 stalled-cycles-frontend # 74,91%
frontend cycles idle ( +- 0,24% )
22.139.448.400 stalled-cycles-backend # 56,48%
backend cycles idle ( +- 0,23% )
18.565.538.523 instructions # 0,47
insn per cycle
# 1,57 stalled
cycles per insn ( +- 0,17% )
4.059.885.546 branches # 170,359
M/sec ( +- 0,17% )
59.991.226 branch-misses # 1,48% of
all branches ( +- 0,19% )
7,3892 +- 0,0127 seconds time elapsed ( +- 0,17% )
[ AFTER ]
Performance counter stats for './0ddad21d3e99' (10 runs):
24.175,94 msec task-clock # 3,362
CPUs utilized ( +- 0,11% )
1.139.152 context-switches # 47,119
K/sec ( +- 0,12% )
407.994 cpu-migrations # 16,876
K/sec ( +- 0,26% )
1.555 page-faults # 64,319
/sec ( +- 3,11% )
40.904.849.091 cycles # 1,692 GHz
( +- 0,13% )
30.587.623.034 stalled-cycles-frontend # 74,84%
frontend cycles idle ( +- 0,15% )
23.145.533.537 stalled-cycles-backend # 56,63%
backend cycles idle ( +- 0,16% )
18.762.964.037 instructions # 0,46
insn per cycle
# 1,63 stalled
cycles per insn ( +- 0,11% )
4.057.182.849 branches # 167,817
M/sec ( +- 0,09% )
63.887.806 branch-misses # 1,58% of
all branches ( +- 0,25% )
7,19157 +- 0,00644 seconds time elapsed ( +- 0,09% )
[ RESULT ]
seconds time elapsed: - 2,67%
The test-case c-file is attached and for the case the above lines were
truncated I have attached as a README file.
Feel free to add a...
Tested-by: Sedat Dilek <sedat.dilek@...il.com> # LLVM v15.0.3 (x86-64)
If you need further information, please let me know.
-Sedat-
> I've been stalling on this patch until Linus gets back to his desk,
> which now appears to have happened.
>
> Hongchen, when convenient, please capture this discussion (as well as
> the testing results with Linus's sample code) in the changelog and send
> us a v4, with Linus on cc?
>
View attachment "0ddad21d3e99.c" of type "text/x-csrc" (787 bytes)
Download attachment "README_zhanghongchen-pipe-v3-0ddad21d3e99" of type "application/octet-stream" (3234 bytes)
Powered by blists - more mailing lists