linux-ext4 - Re: Performance testing of various barrier reduction patches [was: Re: [RFC v4] ext4: Coordinate fsync requests]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100923232527.GB25624@tux1.beaverton.ibm.com>
Date:	Thu, 23 Sep 2010 16:25:27 -0700
From:	"Darrick J. Wong" <djwong@...ibm.com>
To:	Andreas Dilger <adilger@...ger.ca>
Cc:	"Ted Ts'o" <tytso@....edu>, Mingming Cao <cmm@...ibm.com>,
	Ric Wheeler <rwheeler@...hat.com>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Keith Mannthey <kmannth@...ibm.com>,
	Mingming Cao <mcao@...ibm.com>, Tejun Heo <tj@...nel.org>,
	hch@....de
Subject: Re: Performance testing of various barrier reduction patches [was:
	Re: [RFC v4] ext4: Coordinate fsync requests]

Hi all,

I just retested with 2.6.36-rc5 and the same set of patches as before
(flush_fua, fsync_coordination, etc) and have an even larger spreadsheet:
http://bit.ly/ahdhyk

This time, however, I instrumented the kernel to report the amount of time it
takes to complete the flush operation.  The test setups elm3a63, elm3c44_sas,
and elm3c71_sas are all arrays that have battery backed write-back cache; it
should not be a huge shock that the average flush time generally stays under
8ms for these setups.  elm3c65 and elm3c75_ide are single disk SAS and IDE
disks (no write cache), and the other setups all feature md-raids backed by
SCSI disks (also no write cache).  The flush_times tab in the spreadsheet lists
average, max, and min sync times.

Turning to the ffsb scores, I can see some of the same results that I saw while
testing 2.6.36-rc1 a few weeks ago.  Now that I've had the time to look at how
the code works and evaluate a lot more setups, I think I can speculate further
about the cause of the regression that I see with the fsync coordination patch.
Because I'm testing the effects of varying the fsync_delay values, I've bolded
the highest score for each unique (directio, nojan, nodj) configuration, and it
appears that the most winning cases are fsync_delay=0 which corresponds to the
old fsync behavior (every caller issues a flush), and fsync_delay=-1 which
corresponds to a coordination delay equal to the average flush duration.

To try to find an explanation, I started looking for connections between fsync
delay values and average flush times.  I noticed that the setups with low (<
8ms) flush times exhibit better performance when fsync coordination is not
attempted, and the setups with higher flush times exhibit better performance
when fsync coordination happens.  This also is no surprise, as it seems
perfectly reasonable that the more time consuming a flush is, the more desirous
it is to spend a little time coordinating those flushes across CPUs.

I think a reasonable next step would be to alter this patch so that
ext4_sync_file always measures the duration of the flushes that it issues, but
only enable the coordination steps if it detects the flushes taking more than
about 8ms.  One thing I don't know for sure is whether 8ms is a result of 2*HZ
(currently set to 250) or if 8ms is a hardware property.

As for safety testing, I've been running power-fail tests on the single-disk
systems with the same ffsb profile.  So far I've observed a lot of fsck
complaints about orphaned inodes being truncated ("Truncating orphaned inode
1352607 (uid=0, gid=0, mode=0100700, size=4096)") though this happens
regardless of whether I run with this 2.6.36 test kernel of mine or a plain
vanilla 2.6.35 configuration.  I've not seen any serious corruption yet.

So, what do people think of these latest results?

--D

On Mon, Aug 23, 2010 at 11:31:19AM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> I retested the ext4 barrier mitigation patchset against a base of 2.6.36-rc1 +
> Tejun's flush_fua tree + Christoph's patches to change FS barrier semantics,
> and came up with this new spreadsheet:
> http://bit.ly/bWpbsT
> 
> Here are the previous 2.6.35 results for convenience: http://bit.ly/c22grd
> 
> The machine configurations are the same as with the previous (2.6.35)
> spreadsheet.  It appears to be the case that Tejun and Christoph's patches to
> change barrier use into simpler cache flushes generally improve the speed of
> the fsync-happy workload in buffered I/O mode ... if you have a bunch of
> spinning disks.  Results for the SSD array (elm3c44) and the single disk
> systems (elm3c65/elm3c75) decreased slightly.  For the case where direct I/O
> was used, the patchset improved the results in nearly all cases.  The speed
> with barriers on is getting closer to the speed with barriers off, thankfully!
> 
> Unfortunately, one thing that became /much/ less clear in these new results is
> the impact of the other patch sets that we've been working on to make ext4
> smarter with regards to barrier/flush use.  In most cases I don't really see
> the fsync-delay patch having much effect for directio, and it seems to have
> wild effects when buffered mode is used.  Jan Kara's barrier generation patch
> still generally helps with directio loads.  I've also concluded that my really
> old dirty-flag patchset from ages ago no longer has any effect.
> 
> What does everyone else think of these results?
> 
> --D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html