lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <gspf7guqczppgfrus5lfhinyl62xezc4h7nqcnd4m7243v4mna@hxmu2wousrh7>
Date: Wed, 25 Dec 2024 11:32:24 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, 
	WangYuli <wangyuli@...ontech.com>, viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, yushengjin@...ontech.com, 
	zhangdandan@...ontech.com, guanwentao@...ontech.com, zhanjun@...ontech.com, 
	oliver.sang@...el.com, ebiederm@...ssion.com, colin.king@...onical.com, 
	josh@...htriplett.org, penberg@...helsinki.fi, manfred@...orfullife.com, mingo@...e.hu, 
	jes@....com, hch@....de, aia21@...tab.net, arjan@...radead.org, 
	jgarzik@...ox.com, neukum@...hschaft.cup.uni-muenchen.de, oliver@...kum.name, 
	dada1@...mosbay.com, axboe@...nel.dk, axboe@...e.de, nickpiggin@...oo.com.au, 
	dhowells@...hat.com, nathans@....com, rolandd@...co.com, tytso@....edu, 
	bunk@...sta.de, pbadari@...ibm.com, ak@...ux.intel.com, ak@...e.de, 
	davem@...emloft.net, jsipek@...sunysb.edu
Subject: Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping
 processes during pipe read/write

On Wed, Dec 25, 2024 at 05:04:46PM +0100, Mateusz Guzik wrote:
> On Wed, Dec 25, 2024 at 08:53:05AM -0500, Kent Overstreet wrote:
> > On Wed, Dec 25, 2024 at 03:30:05PM +0200, Andy Shevchenko wrote:
> > > Don't you think the Cc list is a bit overloaded?
> > 
> > Indeed, my mail server doesn't let me reply-all.
> > 
> > > On Wed, Dec 25, 2024 at 05:42:02PM +0800, WangYuli wrote:
> > > > +config PIPE_SKIP_SLEEPER
> > > > +	bool "Skip sleeping processes during pipe read/write"
> > > > +	default n
> > > 
> > > 'n' is the default 'default', no need to have this line.
> > 
> > Actually, I'd say to skip the kconfig option for this. Kconfig options
> > that affect the behaviour of core code increase our testing burden, and
> > are another variable to account for when chasing down bugs, and the
> > potential overhead looks negligable.
> > 
> 
> I agree the behavior should not be guarded by an option. However,
> because of how wq_has_sleeper is implemented (see below) I would argue
> this needs to show how often locking can be avoided in real workloads.
> 
> The commit message does state this comes with a slowdown for cases which
> can't avoid wakeups, but as is I thought the submitter just meant an
> extra branch.
> 
> > Also, did you look at adding this optimization to wake_up()? No-op
> > wakeups are very common, I think this has wider applicability.
> 
> I was going to suggest it myself, but then:
> 
> static inline bool wq_has_sleeper(struct wait_queue_head *wq_head)
> {
>         /*
>          * We need to be sure we are in sync with the
>          * add_wait_queue modifications to the wait queue.
>          *
>          * This memory barrier should be paired with one on the
>          * waiting side.
>          */
>         smp_mb();
>         return waitqueue_active(wq_head);
> }
> 
> Which means this is in fact quite expensive.
>
> Since wakeup is a lock + an interrupt trip, it would still be
> cheaper single-threaded to "merely" suffer a full fence and for cases
> where the queue is empty often enough this is definitely the right thing
> to do.

We're comparing against no-op wakeup. A real wakeup does an IPI, which
completely dwarfs the cost of a barrier.

And note that wake_up() is spin_lock_irqsave(), not spin_lock(). I
assume it's gotten better, but back when I was looking at waitqueues
nested pushf/popf was horrifically expensive.

But perhaps can we do this with just a release barrier? Similar to how
list_empty_careful() works.

> On the other hand this executing when the queue is mostly *not* empty
> would combat the point.
> 
> So unfortunately embedding this in wake_up is a no-go.

You definitely can't say that without knowing how often no-op
wake_up()s occur. It wouldn't be hard to gather that (write a patch to
add a pair of percpu counters, throw it on a few machines running random
workloads) and I think the results might surprise you.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ