linux-kernel - Re: [patch v3 2/3] block: hold queue if flush is running for non-queueable flush drive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110509135851.GD5975@redhat.com>
Date:	Mon, 9 May 2011 09:58:51 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Shaohua Li <shaohua.li@...el.com>
Cc:	Tejun Heo <tj@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-ide@...r.kernel.org" <linux-ide@...r.kernel.org>,
	"jaxboe@...ionio.com" <jaxboe@...ionio.com>,
	"hch@...radead.org" <hch@...radead.org>,
	"jgarzik@...ox.com" <jgarzik@...ox.com>,
	"djwong@...ibm.com" <djwong@...ibm.com>,
	"sshtylyov@...sta.com" <sshtylyov@...sta.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
	"ricwheeler@...il.com" <ricwheeler@...il.com>
Subject: Re: [patch v3 2/3] block: hold queue if flush is running for
 non-queueable flush drive

On Mon, May 09, 2011 at 09:50:46PM +0800, Shaohua Li wrote:
> On Mon, May 09, 2011 at 09:03:16PM +0800, Vivek Goyal wrote:
> > On Thu, May 05, 2011 at 10:38:53AM +0200, Tejun Heo wrote:
> > 
> > [..]
> > > Similarly, I'd like to suggest something like the following.
> > > 
> > > 		/*
> > > 		 * Hold dispatching of regular requests if non-queueable
> > > 		 * flush is in progress; otherwise, the low level driver
> > > 		 * would keep dispatching IO requests just to requeue them
> > > 		 * until the flush finishes, which not only adds
> > > 		 * dispatching / requeueing overhead but may also
> > > 		 * significantly affect throughput when multiple flushes
> > > 		 * are issued back-to-back.  Please consider the following
> > > 		 * scenario.
> > > 		 *
> > > 		 * - flush1 is dispatched with write1 in the elevator.
> > > 		 *
> > > 		 * - Driver dispatches write1 and requeues it.
> > > 		 *
> > > 		 * - flush2 is issued and appended to dispatch queue after
> > > 		 *   the requeued write1.  As write1 has been requeued
> > > 		 *   flush2 can't be put in front of it.
> > > 		 *
> > > 		 * - When flush1 finishes, the driver has to process write1
> > > 		 *   before flush2 even though there's no fundamental
> > > 		 *   reason flush2 can't be processed first and, when two
> > > 		 *   flushes are issued back-to-back without intervening
> > > 		 *   writes, the second one essentially becomes noop.
> > > 		 *
> > > 		 * This phenomena becomes quite visible under heavy
> > > 		 * concurrent fsync workload and holding the queue while
> > > 		 * flush is in progress leads to significant throughput
> > > 		 * gain.
> > > 		 */
> > 
> > Tejun,
> > 
> > I am assuming that these back-to-back flushes are independent of each
> > other otherwise write request will anyway get between two flushes.
> Hi,
> yes, the flushes are independent.
>  
> > If that's the case, then should we solve the problem by improving flush
> > merge logic a bit better. (Say idle a bit before issuing a flush only
> > if request queue is not empty).
> I tried some ways to improve flush merge logic. The problem I observed is something like:
> say we have 10 flushes, originally we dispatch 4 flush, write, 6 flush. doing more merge
> we have 6 flush, write, 4 flush. the flush request number sent to drive isn't reduced.

If we try to get rid of WRITE completely between these 10 flushes then we
run the risk of starving other READS/WRITES as long as flushes are going on.

> Another reason I didn't see improvement with better back-to-back merge might be drive
> already optimizes two adjacent flushes case well.

I did not understand this one. With improved back to back merge logic 
we have already optimized the flush case. So for 10 flush and one write
we seem to be issuing following (as per your mail).

1 flush (6 flush merged)--> WRITE --> 1flush (4 flush merged).

So where is the opprotinutiy for drive (non flush queuing drive) to optimize
flush here?

IOW, if flush merging is already working well, do we really want to move
in a direction where we can potentially starve other READ/WRITE happening
in an attempt to improve throughput for a sepecific workload.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/