linux-kernel - Re: write-behind on streaming writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 5 Jun 2012 13:18:51 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Fengguang Wu <fengguang.wu@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"Myklebust, Trond" <Trond.Myklebust@...app.com>,
	linux-fsdevel@...r.kernel.org,
	Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: write-behind on streaming writes

On Tue, Jun 05, 2012 at 11:01:48AM +1000, Dave Chinner wrote:
> On Wed, May 30, 2012 at 11:21:29AM +0800, Fengguang Wu wrote:
> > Linus,
> > 
> > On Tue, May 29, 2012 at 10:35:46AM -0700, Linus Torvalds wrote:
> > > On Tue, May 29, 2012 at 8:57 AM, Fengguang Wu <fengguang.wu@...el.com> wrote:
> > > I just suspect that we'd be better off teaching upper levels about the
> > > streaming. I know for a fact that if I do it by hand, system
> > > responsiveness was *much* better, and IO throughput didn't go down at
> > > all.
> > 
> > Your observation of better responsiveness may well be stemmed from
> > these two aspects:
> > 
> > 1) lower dirty/writeback pages
> > 2) the async write IO queue being drained constantly
> > 
> > (1) is obvious. For a mem=4G desktop, the default dirty limit can be
> > up to (4096 * 20% = 819MB). While your smart writer effectively limits
> > dirty/writeback pages to a dramatically lower 16MB.
> > 
> > (2) comes from the use of _WAIT_ flags in
> > 
> >         sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER);
> > 
> > Each sync_file_range() syscall will submit 8MB write IO and wait for
> > completion. That means the async write IO queue constantly swing
> > between 0 and 8MB fillness at the frequency (100MBps / 8MB = 12.5ms).
> > So on every 12.5ms, the async IO queue runs empty, which gives any
> > pending read IO (from firefox etc.) a chance to be serviced. Nice
> > and sweet breaks!
> > 
> > I suspect (2) contributes *much more* than (1) to desktop responsiveness.
> 
> Almost certainly, especially with NCQ devices where even if the IO
> scheduler preempts the write queue immediately, the device might
> complete the outstanding 31 writes before servicing the read which
> is issued as the 32nd command....

CFQ does preempt async IO once sync IO gets queued.

> 
> So NCQ depth is going to play a part here as well.

Yes NCQ depth does contribute primarily to READ latencies in presence of
async IO. I think disk drivers and disk firmware should also participate in 
prioritizing READs over pending WRITEs to improve the situation.

IO scheduler can only do so much. CFQ already tries hard to keep pending
async queue depth low and that results in lower throughput many a times
(as compared to deadline).

In fact CFQ tries so hard to prioritize SYNC IO over async IO, that I have
often heard cases of WRITEs being starved and people facing "task blocked
for 120 second warnings".

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/