[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150430014616.GZ15810@dastard>
Date: Thu, 30 Apr 2015 11:46:16 +1000
From: Dave Chinner <david@...morbit.com>
To: Daniel Phillips <daniel@...nq.net>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
tux3@...3.org, Theodore Ts'o <tytso@....edu>
Subject: Re: Tux3 Report: How fast can we fsync?
On Tue, Apr 28, 2015 at 04:13:18PM -0700, Daniel Phillips wrote:
> Greetings,
>
> This post is dedicated to Ted, who raised doubts a while back about
> whether Tux3 can ever have a fast fsync:
>
> https://lkml.org/lkml/2013/5/11/128
> "Re: Tux3 Report: Faster than tmpfs, what?"
[snip]
> I measured fsync performance using a 7200 RPM disk as a virtual
> drive under KVM, configured with cache=none so that asynchronous
> writes are cached and synchronous writes translate into direct
> writes to the block device.
Yup, a slow single spindle, so fsync performance is determined by
seek latency of the filesystem. Hence the filesystem that "wins"
will be the filesystem that minimises fsync seek latency above all
other considerations.
http://www.spinics.net/lists/kernel/msg1978216.html
So, to demonstrate, I'll run the same tests but using a 256GB
samsung 840 EVO SSD and show how much the picture changes. I didn't
test tux3, you don't make it easy to get or build.
> To focus purely on fsync, I wrote a
> small utility (at the end of this post) that forks a number of
> tasks, each of which continuously appends to and fsyncs its own
> file. For a single task doing 1,000 fsyncs of 1K each, we have:
>
> Ext4: 34.34s
> XFS: 23.63s
> Btrfs: 34.84s
> Tux3: 17.24s
Ext4: 1.94s
XFS: 2.06s
Btrfs: 2.06s
All equally fast, so I can't see how tux3 would be much faster here.
> Things get more interesting with parallel fsyncs. In this test, each
> task does ten fsyncs and task count scales from ten to ten thousand.
> We see that all tested filesystems are able to combine fsyncs into
> group commits, with varying degrees of success:
>
> Tasks: 10 100 1,000 10,000
> Ext4: 0.79s 0.98s 4.62s 61.45s
> XFS: 0.75s 1.68s 20.97s 238.23s
> Btrfs 0.53s 0.78s 3.80s 84.34s
> Tux3: 0.27s 0.34s 1.00s 6.86s
Tasks: 10 100 1,000 10,000
Ext4: 0.05s 0.12s 0.48s 3.99s
XFS: 0.25s 0.41s 0.96s 4.07s
Btrfs 0.22s 0.50s 2.86s 161.04s
(lower is better)
Ext4 and XFS are fast and show similar performance. Tux3 *can't* be
very much faster as most of the elapsed time in the test is from
forking the processes that do the IO and fsyncs.
FWIW, btrfs shows it's horrible fsync implementation here, burning
huge amounts of CPU to do bugger all IO. i.e. it burnt all 16p for 2
and a half minutes in that 10000 fork test so wasn't IO bound at
all.
> Is there any practical use for fast parallel fsync of tens of thousands
> of tasks? This could be useful for a scalable transaction server
> that sits directly on the filesystem instead of a database, as is
> the fashion for big data these days. It certainly can't hurt to know
> that if you need that kind of scaling, Tux3 will do it.
Ext4 and XFS already do that just fine, too, when you use storage
suited to such a workload and you have a sane interface for
submitting tens of thousands of concurrent fsync operations. e.g
http://oss.sgi.com/archives/xfs/2014-06/msg00214.html
> Of course, a pure fsync load could be viewed as somewhat unnatural. We
> also need to know what happens under a realistic load with buffered
> operations mixed with fsyncs. We turn to an old friend, dbench:
>
> Dbench -t10
>
> Tasks: 8 16 32
> Ext4: 35.32 MB/s 34.08 MB/s 39.71 MB/s
> XFS: 32.12 MB/s 25.08 MB/s 30.12 MB/s
> Btrfs: 54.40 MB/s 75.09 MB/s 102.81 MB/s
> Tux3: 85.82 MB/s 133.69 MB/s 159.78 MB/s
> (higher is better)
On a SSD (256GB samsung 840 EVO), running 4.0.0:
Tasks: 8 16 32
Ext4: 598.27 MB/s 981.13 MB/s 1233.77 MB/s
XFS: 884.62 MB/s 1328.21 MB/s 1373.66 MB/s
Btrfs: 201.64 MB/s 137.55 MB/s 108.56 MB/s
dbench looks *very different* when there is no seek latency,
doesn't it?
> Dbench -t10 -s (all file operations synchronous)
>
> Tasks: 8 16 32
> Ext4: 4.51 MB/s 6.25 MB/s 7.72 MB/s
> XFS: 4.24 MB/s 4.77 MB/s 5.15 MB/s
> Btrfs: 7.98 MB/s 13.87 MB/s 22.87 MB/s
> Tux3: 15.41 MB/s 25.56 MB/s 39.15 MB/s
> (higher is better)
Ext4: 173.54 MB/s 294.41 MB/s 424.11 MB/s
XFS: 172.98 MB/s 342.78 MB/s 458.87 MB/s
Btrfs: 36.92 MB/s 34.52 MB/s 55.19 MB/s
Again, the numbers are completely the other way around on a SSD,
with the conventional filesystems being 5-10x faster than the
WA/COW style filesystem.
....
> In the full disclosure department, Tux3 is still not properly
> optimized in some areas. One of them is fragmentation: it is not
> very hard to make Tux3 slow down by running long tests. Our current
Oh, that still hasn't been fixed?
Until you sort of how you are going to scale allocation to tens of
TB and not fragment free space over time, fsync performance of the
filesystem is pretty much irrelevant. Changing the allocation
algorithms will fundamentally alter the IO patterns and so all these
benchmarks are essentially meaningless.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists