[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1df49d97-df0e-4471-9e40-a850b758d981@colorfullife.com>
Date: Fri, 27 Dec 2024 19:39:06 +0100
From: Manfred Spraul <manfred@...orfullife.com>
To: Oleg Nesterov <oleg@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
oliver.sang@...el.com, ebiederm@...ssion.com, colin.king@...onical.com,
josh@...htriplett.org, penberg@...helsinki.fi, mingo@...e.hu, jes@....com,
hch@....de, aia21@...tab.net, arjan@...radead.org, jgarzik@...ox.com,
neukum@...hschaft.cup.uni-muenchen.de, oliver@...kum.name,
dada1@...mosbay.com, axboe@...nel.dk, axboe@...e.de,
nickpiggin@...oo.com.au, dhowells@...hat.com, nathans@....com,
rolandd@...co.com, tytso@....edu, bunk@...sta.de, pbadari@...ibm.com,
ak@...ux.intel.com, ak@...e.de, davem@...emloft.net, jsipek@...sunysb.edu,
jens.axboe@...cle.com, ramsdell@...re.org, hch@...radead.org,
akpm@...ux-foundation.org, randy.dunlap@...cle.com, efault@....de,
rdunlap@...radead.org, haveblue@...ibm.com, drepper@...hat.com,
dm.n9107@...il.com, jblunck@...e.de, davidel@...ilserver.org,
mtk.manpages@...glemail.com, linux-arch@...r.kernel.org,
vda.linux@...glemail.com, jmorris@...ei.org, serue@...ibm.com,
hca@...ux.ibm.com, rth@...ddle.net, lethal@...ux-sh.org,
tony.luck@...el.com, heiko.carstens@...ibm.com, andi@...stfloor.org,
corbet@....net, crquan@...il.com, mszeredi@...e.cz, miklos@...redi.hu,
peterz@...radead.org, a.p.zijlstra@...llo.nl, earl_chew@...lent.com,
npiggin@...il.com, npiggin@...e.de, julia@...u.dk, jaxboe@...ionio.com,
nikai@...ai.net, dchinner@...hat.com, davej@...hat.com, npiggin@...nel.dk,
eric.dumazet@...il.com, tim.c.chen@...ux.intel.com, xemul@...allels.com,
tj@...nel.org, serge.hallyn@...onical.com, gorcunov@...nvz.org,
bcrl@...ck.org, alan@...rguk.ukuu.org.uk, will.deacon@....com,
will@...nel.org, zab@...hat.com, balbi@...com, gregkh@...uxfoundation.org,
rusty@...tcorp.com.au, socketpair@...il.com,
penguin-kernel@...ove.sakura.ne.jp, mhocko@...nel.org, axboe@...com,
tglx@...utronix.de, mcgrof@...nel.org, linux@...inikbrodowski.net,
willy@...radead.org, paulmck@...nel.org, kernel@...force.de,
linux-morello@...lists.linaro.org
Subject: Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping
processes during pipe read/write
Hi,
(I had to remove many cc, my mail server rejected to send)
On 12/26/24 9:11 PM, Oleg Nesterov wrote:
> I mostly agree, see my reply to this patch, but...
>
> On 12/26, Linus Torvalds wrote:
>> If the optimization is correct, there is no point to having a config option.
>>
>> If the optimization is incorrect, there is no point to having the code.
>>
>> Either way, there's no way we'd ever have a config option for this.
> Agreed,
>
>>> + if (was_full && pipe_check_wq_has_sleeper(&pipe->wr_wait))
>>> wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
>> End result: you need to explain why the race cannot exist.
> Agreed,
>
>> And I think your patch is simply buggy
> Agreed again, but probably my reasoning was wrong,
>
>> But now waiters use "wait_event_interruptible_exclusive()" explicitly
>> outside the pipe mutex, so the waiters and wakers aren't actually
>> serialized.
> This is what I don't understand... Could you spell ?
> I _think_ that
>
> wait_event_whatever(WQ, CONDITION);
> vs
>
> CONDITION = 1;
> if (wq_has_sleeper(WQ))
> wake_up_xxx(WQ, ...);
>
> is fine.
This pattern is documented in wait.h:
https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/wait.h#L96
Thus if there an issue, then the documentation should be updated.
> Both wq_has_sleeper() and wait_event_whatever()->prepare_to_wait_event()
> have the necessary barriers to serialize the waiters and wakers.
>
> Damn I am sure I missed something ;)
Actually:
Does this work universally? I.e. can we add the optimization into
__wake_up()?
But I do not understand this comment (from 2.6.0)
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/kernel/fork.c?h=v2.6.0&id=e220fdf7a39b54a758f4102bdd9d0d5706aa32a7
> /* * Note: we use "set_current_state()" _after_ the wait-queue add, *
> because we need a memory barrier there on SMP, so that any *
> wake-function that tests for the wait-queue being active * will be
> guaranteed to see waitqueue addition _or_ subsequent * tests in this
> thread will see the wakeup having taken place. * * The spin_unlock()
> itself is semi-permeable and only protects * one way (it only protects
> stuff inside the critical region and * stops them from bleeding out -
> it would still allow subsequent * loads to move into the the critical
> region). */ voidprepare_to_wait
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/prepare_to_wait>(wait_queue_head_t
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait_queue_head_t>*q,wait_queue_t
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait_queue_t>*wait
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>,intstate) {
> unsignedlongflags; wait
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>->flags&=~WQ_FLAG_EXCLUSIVE
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/WQ_FLAG_EXCLUSIVE>;
> spin_lock_irqsave
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/spin_lock_irqsave>(&q->lock,flags);
> if(list_empty
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/list_empty>(&wait
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>->task_list
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/task_list>))
> __add_wait_queue
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/__add_wait_queue>(q,wait
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>);
> set_current_state
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/set_current_state>(state);
> spin_unlock_irqrestore
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/spin_unlock_irqrestore>(&q->lock,flags);
> }
set_current_state() now uses smp_store_mb(), which is a memory barrier
_after_ the store. Thus I do not see what enforces that the store
happens before the store for the __add_wait_queue().
--
Manfred
View attachment "0001-__wake_up_common_lock-Optimize-for-empty-wake-queues.patch" of type "text/x-patch" (2525 bytes)
Powered by blists - more mailing lists