lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150407160958.GA105179@kernel.org>
Date:	Tue, 7 Apr 2015 09:09:58 -0700
From:	Shaohua Li <shli@...nel.org>
To:	Jens Axboe <axboe@...nel.dk>
Cc:	Jeff Moyer <jmoyer@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] blk-plug: don't flush nested plug lists

On Wed, Apr 08, 2015 at 11:38:14AM -0600, Jens Axboe wrote:
> On 04/06/2015 01:14 PM, Jeff Moyer wrote:
> >The way the on-stack plugging currently works, each nesting level
> >flushes its own list of I/Os.  This can be less than optimal (read
> >awful) for certain workloads.  For example, consider an application
> >that issues asynchronous O_DIRECT I/Os.  It can send down a bunch of
> >I/Os together in a single io_submit call, only to have each of them
> >dispatched individually down in the bowells of the dirct I/O code.
> >The reason is that there are blk_plug's instantiated both at the upper
> >call site in do_io_submit and down in do_direct_IO.  The latter will
> >submit as little as 1 I/O at a time (if you have a small enough I/O
> >size) instead of performing the batching that the plugging
> >infrastructure is supposed to provide.
> >
> >Now, for the case where there is an elevator involved, this doesn't
> >really matter too much.  The elevator will keep the I/O around long
> >enough for it to be merged.  However, in cases where there is no
> >elevator (like blk-mq), I/Os are simply dispatched immediately.
> >
> >Try this, for example (note I'm using a virtio-blk device, so it's
> >using the blk-mq single queue path, though I've also reproduced this
> >with the micron p320h):
> >
> >fio --rw=read --bs=4k --iodepth=128 --iodepth_batch=16 --iodepth_batch_complete=16 --runtime=10s --direct=1 --filename=/dev/vdd --name=job1 --ioengine=libaio --time_based
> >
> >If you run that on a current kernel, you will get zero merges.  Zero!
> >After this patch, you will get many merges (the actual number depends
> >on how fast your storage is, obviously), and much better throughput.
> >Here are results from my test rig:
> >
> >Unpatched kernel:
> >Read B/W:    283,638 KB/s
> >Read Merges: 0
> >
> >Patched kernel:
> >Read B/W:    873,224 KB/s
> >Read Merges: 2,046K
> >
> >I considered several approaches to solving the problem:
> >1)  get rid of the inner-most plugs
> >2)  handle nesting by using only one on-stack plug
> >2a) #2, except use a per-cpu blk_plug struct, which may clean up the
> >     code a bit at the expense of memory footprint
> >
> >Option 1 will be tricky or impossible to do, since inner most plug
> >lists are sometimes the only plug lists, depending on the call path.
> >Option 2 is what this patch implements.  Option 2a is perhaps a better
> >idea, but since I already implemented option 2, I figured I'd post it
> >for comments and opinions before rewriting it.
> >
> >Much of the patch involves modifying call sites to blk_start_plug,
> >since its signature is changed.  The meat of the patch is actually
> >pretty simple and constrained to block/blk-core.c and
> >include/linux/blkdev.h.  The only tricky bits were places where plugs
> >were finished and then restarted to flush out I/O.  There, I went
> >ahead and exported blk_flush_plug_list and called that directly.
> >
> >Comments would be greatly appreciated.
> 
> It's hard to argue with the increased merging for your case. The task plugs
> did originally work like you changed them to, not flushing until the
> outermost plug was flushed. Unfortunately I don't quite remember why I
> changed them, will have to do a bit of digging to refresh my memory.

The behavior never changed. Current code only flush outermost plug.
blk_start_plug() doesn't assign plug to current task for inner plug, so
requests are all added to outmost plug.

maybe the code can be cleaned up as:
start_plug
{
	if (current->plug)
		returnl
	current->plug = plug
}

end_plug()
{
	if (plug != current->plug)
		return;
	flush_plug
	current->plug = NULL;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ