lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1df49d97-df0e-4471-9e40-a850b758d981@colorfullife.com>
Date: Fri, 27 Dec 2024 19:39:06 +0100
From: Manfred Spraul <manfred@...orfullife.com>
To: Oleg Nesterov <oleg@...hat.com>,
 Linus Torvalds <torvalds@...ux-foundation.org>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 oliver.sang@...el.com, ebiederm@...ssion.com, colin.king@...onical.com,
 josh@...htriplett.org, penberg@...helsinki.fi, mingo@...e.hu, jes@....com,
 hch@....de, aia21@...tab.net, arjan@...radead.org, jgarzik@...ox.com,
 neukum@...hschaft.cup.uni-muenchen.de, oliver@...kum.name,
 dada1@...mosbay.com, axboe@...nel.dk, axboe@...e.de,
 nickpiggin@...oo.com.au, dhowells@...hat.com, nathans@....com,
 rolandd@...co.com, tytso@....edu, bunk@...sta.de, pbadari@...ibm.com,
 ak@...ux.intel.com, ak@...e.de, davem@...emloft.net, jsipek@...sunysb.edu,
 jens.axboe@...cle.com, ramsdell@...re.org, hch@...radead.org,
 akpm@...ux-foundation.org, randy.dunlap@...cle.com, efault@....de,
 rdunlap@...radead.org, haveblue@...ibm.com, drepper@...hat.com,
 dm.n9107@...il.com, jblunck@...e.de, davidel@...ilserver.org,
 mtk.manpages@...glemail.com, linux-arch@...r.kernel.org,
 vda.linux@...glemail.com, jmorris@...ei.org, serue@...ibm.com,
 hca@...ux.ibm.com, rth@...ddle.net, lethal@...ux-sh.org,
 tony.luck@...el.com, heiko.carstens@...ibm.com, andi@...stfloor.org,
 corbet@....net, crquan@...il.com, mszeredi@...e.cz, miklos@...redi.hu,
 peterz@...radead.org, a.p.zijlstra@...llo.nl, earl_chew@...lent.com,
 npiggin@...il.com, npiggin@...e.de, julia@...u.dk, jaxboe@...ionio.com,
 nikai@...ai.net, dchinner@...hat.com, davej@...hat.com, npiggin@...nel.dk,
 eric.dumazet@...il.com, tim.c.chen@...ux.intel.com, xemul@...allels.com,
 tj@...nel.org, serge.hallyn@...onical.com, gorcunov@...nvz.org,
 bcrl@...ck.org, alan@...rguk.ukuu.org.uk, will.deacon@....com,
 will@...nel.org, zab@...hat.com, balbi@...com, gregkh@...uxfoundation.org,
 rusty@...tcorp.com.au, socketpair@...il.com,
 penguin-kernel@...ove.sakura.ne.jp, mhocko@...nel.org, axboe@...com,
 tglx@...utronix.de, mcgrof@...nel.org, linux@...inikbrodowski.net,
 willy@...radead.org, paulmck@...nel.org, kernel@...force.de,
 linux-morello@...lists.linaro.org
Subject: Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping
 processes during pipe read/write

Hi,
(I had to remove many cc, my mail server rejected to send)

On 12/26/24 9:11 PM, Oleg Nesterov wrote:
> I mostly agree, see my reply to this patch, but...
>
> On 12/26, Linus Torvalds wrote:
>> If the optimization is correct, there is no point to having a config option.
>>
>> If the optimization is incorrect, there is no point to having the code.
>>
>> Either way, there's no way we'd ever have a config option for this.
> Agreed,
>
>>> +       if (was_full && pipe_check_wq_has_sleeper(&pipe->wr_wait))
>>>                  wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
>> End result: you need to explain why the race cannot exist.
> Agreed,
>
>> And I think your patch is simply buggy
> Agreed again, but probably my reasoning was wrong,
>
>> But now waiters use "wait_event_interruptible_exclusive()" explicitly
>> outside the pipe mutex, so the waiters and wakers aren't actually
>> serialized.
> This is what I don't understand... Could you spell ?
> I _think_ that
>
> 	wait_event_whatever(WQ, CONDITION);
> vs
>
> 	CONDITION = 1;
> 	if (wq_has_sleeper(WQ))
> 		wake_up_xxx(WQ, ...);
> 	
> is fine.

This pattern is documented in wait.h:

https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/wait.h#L96

Thus if there an issue, then the documentation should be updated.

> Both wq_has_sleeper() and wait_event_whatever()->prepare_to_wait_event()
> have the necessary barriers to serialize the waiters and wakers.
>
> Damn I am sure I missed something ;)

Actually:

Does this work universally? I.e. can we add the optimization into 
__wake_up()?


But I do not understand this comment (from 2.6.0)

https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/kernel/fork.c?h=v2.6.0&id=e220fdf7a39b54a758f4102bdd9d0d5706aa32a7

> /* * Note: we use "set_current_state()" _after_ the wait-queue add, * 
> because we need a memory barrier there on SMP, so that any * 
> wake-function that tests for the wait-queue being active * will be 
> guaranteed to see waitqueue addition _or_ subsequent * tests in this 
> thread will see the wakeup having taken place. * * The spin_unlock() 
> itself is semi-permeable and only protects * one way (it only protects 
> stuff inside the critical region and * stops them from bleeding out - 
> it would still allow subsequent * loads to move into the the critical 
> region). */ voidprepare_to_wait 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/prepare_to_wait>(wait_queue_head_t 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait_queue_head_t>*q,wait_queue_t 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait_queue_t>*wait 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>,intstate) { 
> unsignedlongflags; wait 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>->flags&=~WQ_FLAG_EXCLUSIVE 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/WQ_FLAG_EXCLUSIVE>; 
> spin_lock_irqsave 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/spin_lock_irqsave>(&q->lock,flags); 
> if(list_empty 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/list_empty>(&wait 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>->task_list 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/task_list>)) 
> __add_wait_queue 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/__add_wait_queue>(q,wait 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/wait>); 
> set_current_state 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/set_current_state>(state); 
> spin_unlock_irqrestore 
> <https://elixir.bootlin.com/linux/v2.6.0/C/ident/spin_unlock_irqrestore>(&q->lock,flags); 
> }
set_current_state() now uses smp_store_mb(), which is a memory barrier 
_after_ the store. Thus I do not see what enforces that the store 
happens before the store for the __add_wait_queue().

--

     Manfred

View attachment "0001-__wake_up_common_lock-Optimize-for-empty-wake-queues.patch" of type "text/x-patch" (2525 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ