linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 3 Apr 2009 16:52:01 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jeff Garzik <jeff@...zik.org>
cc:	Mark Lord <lkml@....ca>,
	Lennart Sorensen <lsorense@...lub.uwaterloo.ca>,
	Jens Axboe <jens.axboe@...cle.com>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>, tytso@....edu,
	drees76@...il.com, jesper@...gh.cc,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Fri, 3 Apr 2009, Jeff Garzik wrote:
> 
> If all you want to do is _start_ the write-out from kernel to disk, and let
> the kernel handle it asynchronously, SYNC_FILE_RANGE_WRITE will do that for
> you, eliminating the need for a separate thread.

It may not eliminate the need for a separate thread.

SYNC_FILE_RANGE_WRITE will still block on things. It just will block on 
_much_ less than fsync.

In particular, it will block on:

 - actually queuing up the IO (ie we need to get the bio, request etc all 
   allocated and queued up)

 - if a page is under writeback, and has been marked dirty since that 
   writeback started, we'll wait for that IO to finish in order to start a 
   new one.

and depending on load, both of these things _can_ be issues and you might 
still want to do the SYNC_FILE_RANGE_WRITE as a async thread separate 
from the main loop so that the latency of the main loop is not 
affected by that.

But the latencies will be _much_ smaller issues than with f[data]sync(), 
though, especially if you're not ever really hitting the limits on the 
disk subsystem. Because those will additionally

 - wait for all old writeback to complete (whether the page was dirtied 
   after the writeback started or not)

 - additionally, wait for all the new writeback it started.

 - wait for the metadata too (fsync()).

so they are pretty much _guaranteed_ to sleep for actual IO to complete 
(unless you didn't write anything at all to the file ;)

> On a related subject, reads:  consider posix_fadvise(POSIX_FADV_SEQUENTIAL)
> and/or readahead(2) for optimizing the reading side of things.

I doubt POSIX_FADV_SEQUENTIAL will do very much. The kernel tends to 
figure out the read patterns on its own pretty well. Of course, explicit 
readahead() can be noticeable for the right patterns.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/