[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj5A-fO+GnfwqGpXhFbfpS4+_8xU+dnXkSx+0AfwBYrxA@mail.gmail.com>
Date: Thu, 26 Dec 2024 11:02:45 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: WangYuli <wangyuli@...ontech.com>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
yushengjin@...ontech.com, zhangdandan@...ontech.com, guanwentao@...ontech.com,
zhanjun@...ontech.com, oliver.sang@...el.com, ebiederm@...ssion.com,
colin.king@...onical.com, josh@...htriplett.org, penberg@...helsinki.fi,
manfred@...orfullife.com, mingo@...e.hu, jes@....com, hch@....de,
aia21@...tab.net, arjan@...radead.org, jgarzik@...ox.com,
neukum@...hschaft.cup.uni-muenchen.de, oliver@...kum.name,
dada1@...mosbay.com, axboe@...nel.dk, axboe@...e.de, nickpiggin@...oo.com.au,
dhowells@...hat.com, nathans@....com, rolandd@...co.com, tytso@....edu,
bunk@...sta.de, pbadari@...ibm.com, ak@...ux.intel.com, ak@...e.de,
davem@...emloft.net, jsipek@...sunysb.edu, jens.axboe@...cle.com,
ramsdell@...re.org, hch@...radead.org, akpm@...ux-foundation.org,
randy.dunlap@...cle.com, efault@....de, rdunlap@...radead.org,
haveblue@...ibm.com, drepper@...hat.com, dm.n9107@...il.com, jblunck@...e.de,
davidel@...ilserver.org, mtk.manpages@...glemail.com,
linux-arch@...r.kernel.org, vda.linux@...glemail.com, jmorris@...ei.org,
serue@...ibm.com, hca@...ux.ibm.com, rth@...ddle.net, lethal@...ux-sh.org,
tony.luck@...el.com, heiko.carstens@...ibm.com, oleg@...hat.com,
andi@...stfloor.org, corbet@....net, crquan@...il.com, mszeredi@...e.cz,
miklos@...redi.hu, peterz@...radead.org, a.p.zijlstra@...llo.nl,
earl_chew@...lent.com, npiggin@...il.com, npiggin@...e.de, julia@...u.dk,
jaxboe@...ionio.com, nikai@...ai.net, dchinner@...hat.com, davej@...hat.com,
npiggin@...nel.dk, eric.dumazet@...il.com, tim.c.chen@...ux.intel.com,
xemul@...allels.com, tj@...nel.org, serge.hallyn@...onical.com,
gorcunov@...nvz.org, levinsasha928@...il.com, penberg@...nel.org,
amwang@...hat.com, bcrl@...ck.org, muthu.lkml@...il.com, muthur@...il.com,
mjt@....msk.ru, alan@...rguk.ukuu.org.uk, raven@...maw.net, thomas@...3r.de,
will.deacon@....com, will@...nel.org, josef@...hat.com,
anatol.pomozov@...il.com, koverstreet@...gle.com, zab@...hat.com,
balbi@...com, gregkh@...uxfoundation.org, mfasheh@...e.com,
jlbec@...lplan.org, rusty@...tcorp.com.au, asamymuthupa@...ron.com,
smani@...ron.com, sbradshaw@...ron.com, jmoyer@...hat.com, sim@...tway.ca,
ia@...udflare.com, dmonakhov@...nvz.org, ebiggers3@...il.com,
socketpair@...il.com, penguin-kernel@...ove.sakura.ne.jp, w@....eu,
kirill.shutemov@...ux.intel.com, mhocko@...e.com, vdavydov.dev@...il.com,
vdavydov@...tuozzo.com, hannes@...xchg.org, mhocko@...nel.org,
minchan@...nel.org, deepa.kernel@...il.com, arnd@...db.de, balbi@...nel.org,
swhiteho@...hat.com, konishi.ryusuke@....ntt.co.jp, dsterba@...e.com,
vegard.nossum@...cle.com, axboe@...com, pombredanne@...b.com,
tglx@...utronix.de, joe.lawrence@...hat.com, mpatocka@...hat.com,
mcgrof@...nel.org, keescook@...omium.org, linux@...inikbrodowski.net,
jannh@...gle.com, shakeelb@...gle.com, guro@...com, willy@...radead.org,
khlebnikov@...dex-team.ru, kirr@...edi.com, stern@...land.harvard.edu,
elver@...gle.com, parri.andrea@...il.com, paulmck@...nel.org,
rasibley@...hat.com, jstancek@...hat.com, avagin@...il.com, cai@...hat.com,
josef@...icpanda.com, hare@...e.de, colyli@...e.de, johannes@...solutions.net,
sspatil@...roid.com, alex_y_xu@...oo.ca, mgorman@...hsingularity.net,
gor@...ux.ibm.com, jhubbard@...dia.com, andriy.shevchenko@...ux.intel.com,
crope@....fi, yzaikin@...gle.com, bfields@...ldses.org, jlayton@...nel.org,
kernel@...force.de, steve@....org, nixiaoming@...wei.com,
0x7f454c46@...il.com, kuniyu@...zon.co.jp, alexander.h.duyck@...el.com,
kuni1840@...il.com, soheil@...gle.com, sridhar.samudrala@...el.com,
Vincenzo.Frascino@....com, chuck.lever@...cle.com, Kevin.Brodsky@....com,
Szabolcs.Nagy@....com, David.Laight@...lab.com, Mark.Rutland@....com,
linux-morello@...lists.linaro.org, Luca.Vizzarro@....com,
max.kellermann@...os.com, adobriyan@...il.com, lukas@...auer.dev,
j.granados@...sung.com, djwong@...nel.org, kent.overstreet@...ux.dev,
linux@...ssschuh.net, kstewart@...icios.com
Subject: Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping
processes during pipe read/write
On Wed, 25 Dec 2024 at 01:44, WangYuli <wangyuli@...ontech.com> wrote:
>
>
> +config PIPE_SKIP_SLEEPER
> + bool "Skip sleeping processes during pipe read/write"
If the optimization is correct, there is no point to having a config option.
If the optimization is incorrect, there is no point to having the code.
Either way, there's no way we'd ever have a config option for this.
> +pipe_check_wq_has_sleeper(struct wait_queue_head *wq_head)
> +{
> + return wq_has_sleeper(wq_head);
So generally, the reason this is buggy is that it's racy due to either
CPU memory ordering issues or simply because the sleeper is going to
sleep but hasn't *quite* added itself to the wait queue.
Which is why the wakeup path takes the wq head lock.
Which is the only thing you are actually optimizing away.
> + if (was_full && pipe_check_wq_has_sleeper(&pipe->wr_wait))
> wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
End result: you need to explain why the race cannot exist.
And I think your patch is simply buggy because the race can in fact
exist, because there is no other lock than the waitqueue lock that
enforces memory ordering and protects against the "somebody is just
about to sleep" race.
That said, *if* all the wait queue work was done under the pipe mutex,
we could use the pipe mutex for exclusion instead.
That's actually how it kind of used to work long long ago (not really
- but close: it depended on the BKL originally, and adding waiters to
the wait-queues early before all the tests for full/empty, and
avoiding the races that way)
But now waiters use "wait_event_interruptible_exclusive()" explicitly
outside the pipe mutex, so the waiters and wakers aren't actually
serialized.
And no, you are unlikely to see the races in practice (particularly
the memory ordering ones). So you have to *think* about them, not test
them.
Linus
Powered by blists - more mailing lists