lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1237684330-11770-1-git-send-email-tytso@mit.edu>
Date:	Sat, 21 Mar 2009 21:12:08 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	linux-kernel@...r.kernel.org
Cc:	akpm@...ux-foundation.org,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: [PATCH, RFC 0/2] Mitigate fsync's with ext3's data=ordered mode


Given the recent hoo-hah about ext4 and delayed allocation, I reviewed
the history of the Firefox 3.0 bug here:

    https://bugzilla.mozilla.org/show_bug.cgi?id=421482

Reports of the fsync() getting delayed by up to 30 seconds didn't make
sense to me, since there shouldn't be that much data waiting to be
flushed out, even if there was a very heavy write-intensive job
writing multiple gigabytes to the file.  When I looked more closely,
it became clear that what was really going on was a *read* intensive
job that was starving writes, due to the fact that the writes
submitted from the journal are using WRITE instead of WRITE_SYNC, and
I/O schedulers tend to prioritize reads ahead of writes.

This also explains why Aryan's patch which forced a higher I/O
priority for kjournald was helpful.  This is a better approach, since
we only force journal blocks out using WRITE_SYNC if the transaction
was triggered by something synchronous, such as an fsync() call, or a
file descriptor opened with O_SYNC.  The first patch does cause data
blocks forced out using data=ordered to be written out using
WRITE_SYNC even if the commit kicked off due to the 5 second commit
interval --- however, it does make the right thing happen when the
blocks are being forced out due to fsync() or fdatasync(), when before
the writes were being submitted without being marked as synchronous
writes.  

If it is considered highly objectionable that asynchronous commits
will result in WRITE_SYNC writes, we could add a new flag to the wbc
structure which could be passed all the way down to
block_write_full_page().  On the other hand, in the long run it's
better that commit complete sooner rather than later, since a
subsequent transaction could end up blocked behind the current
transaction, and that subsequent transaction could be a synchronous
one blocking an fsync() or some other synchronous operation.

I've done experiments with and without these patches, and it
definitely helps fsync() latency from between when there is a heavy
read intensive job starving the writes by about 75%.  The workload I
used was a tar command; I suspect if I had used a dd of a really huge
file, the fsync times without the patch would be even worse, and the
concommittent improvements would be even better.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ