lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj5A-fO+GnfwqGpXhFbfpS4+_8xU+dnXkSx+0AfwBYrxA@mail.gmail.com>
Date: Thu, 26 Dec 2024 11:02:45 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: WangYuli <wangyuli@...ontech.com>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	yushengjin@...ontech.com, zhangdandan@...ontech.com, guanwentao@...ontech.com, 
	zhanjun@...ontech.com, oliver.sang@...el.com, ebiederm@...ssion.com, 
	colin.king@...onical.com, josh@...htriplett.org, penberg@...helsinki.fi, 
	manfred@...orfullife.com, mingo@...e.hu, jes@....com, hch@....de, 
	aia21@...tab.net, arjan@...radead.org, jgarzik@...ox.com, 
	neukum@...hschaft.cup.uni-muenchen.de, oliver@...kum.name, 
	dada1@...mosbay.com, axboe@...nel.dk, axboe@...e.de, nickpiggin@...oo.com.au, 
	dhowells@...hat.com, nathans@....com, rolandd@...co.com, tytso@....edu, 
	bunk@...sta.de, pbadari@...ibm.com, ak@...ux.intel.com, ak@...e.de, 
	davem@...emloft.net, jsipek@...sunysb.edu, jens.axboe@...cle.com, 
	ramsdell@...re.org, hch@...radead.org, akpm@...ux-foundation.org, 
	randy.dunlap@...cle.com, efault@....de, rdunlap@...radead.org, 
	haveblue@...ibm.com, drepper@...hat.com, dm.n9107@...il.com, jblunck@...e.de, 
	davidel@...ilserver.org, mtk.manpages@...glemail.com, 
	linux-arch@...r.kernel.org, vda.linux@...glemail.com, jmorris@...ei.org, 
	serue@...ibm.com, hca@...ux.ibm.com, rth@...ddle.net, lethal@...ux-sh.org, 
	tony.luck@...el.com, heiko.carstens@...ibm.com, oleg@...hat.com, 
	andi@...stfloor.org, corbet@....net, crquan@...il.com, mszeredi@...e.cz, 
	miklos@...redi.hu, peterz@...radead.org, a.p.zijlstra@...llo.nl, 
	earl_chew@...lent.com, npiggin@...il.com, npiggin@...e.de, julia@...u.dk, 
	jaxboe@...ionio.com, nikai@...ai.net, dchinner@...hat.com, davej@...hat.com, 
	npiggin@...nel.dk, eric.dumazet@...il.com, tim.c.chen@...ux.intel.com, 
	xemul@...allels.com, tj@...nel.org, serge.hallyn@...onical.com, 
	gorcunov@...nvz.org, levinsasha928@...il.com, penberg@...nel.org, 
	amwang@...hat.com, bcrl@...ck.org, muthu.lkml@...il.com, muthur@...il.com, 
	mjt@....msk.ru, alan@...rguk.ukuu.org.uk, raven@...maw.net, thomas@...3r.de, 
	will.deacon@....com, will@...nel.org, josef@...hat.com, 
	anatol.pomozov@...il.com, koverstreet@...gle.com, zab@...hat.com, 
	balbi@...com, gregkh@...uxfoundation.org, mfasheh@...e.com, 
	jlbec@...lplan.org, rusty@...tcorp.com.au, asamymuthupa@...ron.com, 
	smani@...ron.com, sbradshaw@...ron.com, jmoyer@...hat.com, sim@...tway.ca, 
	ia@...udflare.com, dmonakhov@...nvz.org, ebiggers3@...il.com, 
	socketpair@...il.com, penguin-kernel@...ove.sakura.ne.jp, w@....eu, 
	kirill.shutemov@...ux.intel.com, mhocko@...e.com, vdavydov.dev@...il.com, 
	vdavydov@...tuozzo.com, hannes@...xchg.org, mhocko@...nel.org, 
	minchan@...nel.org, deepa.kernel@...il.com, arnd@...db.de, balbi@...nel.org, 
	swhiteho@...hat.com, konishi.ryusuke@....ntt.co.jp, dsterba@...e.com, 
	vegard.nossum@...cle.com, axboe@...com, pombredanne@...b.com, 
	tglx@...utronix.de, joe.lawrence@...hat.com, mpatocka@...hat.com, 
	mcgrof@...nel.org, keescook@...omium.org, linux@...inikbrodowski.net, 
	jannh@...gle.com, shakeelb@...gle.com, guro@...com, willy@...radead.org, 
	khlebnikov@...dex-team.ru, kirr@...edi.com, stern@...land.harvard.edu, 
	elver@...gle.com, parri.andrea@...il.com, paulmck@...nel.org, 
	rasibley@...hat.com, jstancek@...hat.com, avagin@...il.com, cai@...hat.com, 
	josef@...icpanda.com, hare@...e.de, colyli@...e.de, johannes@...solutions.net, 
	sspatil@...roid.com, alex_y_xu@...oo.ca, mgorman@...hsingularity.net, 
	gor@...ux.ibm.com, jhubbard@...dia.com, andriy.shevchenko@...ux.intel.com, 
	crope@....fi, yzaikin@...gle.com, bfields@...ldses.org, jlayton@...nel.org, 
	kernel@...force.de, steve@....org, nixiaoming@...wei.com, 
	0x7f454c46@...il.com, kuniyu@...zon.co.jp, alexander.h.duyck@...el.com, 
	kuni1840@...il.com, soheil@...gle.com, sridhar.samudrala@...el.com, 
	Vincenzo.Frascino@....com, chuck.lever@...cle.com, Kevin.Brodsky@....com, 
	Szabolcs.Nagy@....com, David.Laight@...lab.com, Mark.Rutland@....com, 
	linux-morello@...lists.linaro.org, Luca.Vizzarro@....com, 
	max.kellermann@...os.com, adobriyan@...il.com, lukas@...auer.dev, 
	j.granados@...sung.com, djwong@...nel.org, kent.overstreet@...ux.dev, 
	linux@...ssschuh.net, kstewart@...icios.com
Subject: Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping
 processes during pipe read/write

On Wed, 25 Dec 2024 at 01:44, WangYuli <wangyuli@...ontech.com> wrote:
>
>
> +config PIPE_SKIP_SLEEPER
> +       bool "Skip sleeping processes during pipe read/write"

If the optimization is correct, there is no point to having a config option.

If the optimization is incorrect, there is no point to having the code.

Either way, there's no way we'd ever have a config option for this.

> +pipe_check_wq_has_sleeper(struct wait_queue_head *wq_head)
> +{
> +               return wq_has_sleeper(wq_head);

So generally, the reason this is buggy is that it's racy due to either
CPU memory ordering issues or simply because the sleeper is going to
sleep but hasn't *quite* added itself to the wait queue.

Which is why the wakeup path takes the wq head lock.

Which is the only thing you are actually optimizing away.

> +       if (was_full && pipe_check_wq_has_sleeper(&pipe->wr_wait))
>                 wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);

End result: you need to explain why the race cannot exist.

And I think your patch is simply buggy because the race can in fact
exist, because there is no other lock than the waitqueue lock that
enforces memory ordering and protects against the "somebody is just
about to sleep" race.

That said, *if* all the wait queue work was done under the pipe mutex,
we could use the pipe mutex for exclusion instead.

That's actually how it kind of used to work long long ago (not really
- but close: it depended on the BKL originally, and adding waiters to
the wait-queues early before all the tests for full/empty, and
avoiding the races that way)

But now waiters use "wait_event_interruptible_exclusive()" explicitly
outside the pipe mutex, so the waiters and wakers aren't actually
serialized.

And no, you are unlikely to see the races in practice (particularly
the memory ordering ones). So you have to *think* about them, not test
them.

         Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ