linux-kernel - Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150512235500.GF18150@thunk.org>
Date:	Tue, 12 May 2015 19:55:00 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	David Lang <david@...g.hm>
Cc:	Daniel Phillips <daniel@...nq.net>, Howard Chu <hyc@...as.com>,
	Dave Chinner <david@...morbit.com>,
	linux-kernel@...r.kernel.org,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Pavel Machek <pavel@....cz>, tux3@...3.org,
	linux-fsdevel@...r.kernel.org,
	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent
 performance? (was Tux3 Report: How fast can we fsync?)

On Tue, May 12, 2015 at 03:35:43PM -0700, David Lang wrote:
> 
> I happen to think that it's correct. It's not that Ext4 isn't tested, but
> that people's expectations of how much it's been tested, and at what scale
> don't match the reality.

Ext4 is used at Google, on a very large number of disks.  Exactly how
large is not something I'm allowed to say, but there's a very amusing
Ted Talk by Randall Munroe (of xkcd fame) on that topic:

http://tedsummaries.com/2014/05/14/randall-munroe-comics-that-ask-what-if/

One thing I can say is that shortly after we deployed ext4 at Google,
thanks to having a very large number of disks, and because we have
very good system monitoring, we detected a file system corruption
problem that happened with a very low probability, but we had enough
disks that we could detect the pattern.  (Fortunately, because
Google's cluster file system has replication and/or erasure coding, no
user data was lost.)  Even though we could notice the problem, it took
us several months to track down the problem.

When we finally did, it turned out to be a race condition which only
took place under high memory pressure.  What was *very* amusing was
after fixing the problem for ext4, I looked at ext3, and discovered
that (a) the ext4 had inerited the bug was also in ext3, and (b) the
bug in ext3 had not been noticed in several enterprise distribution
testing runs done by Red Hat, SuSE, and IBM --- for well over a
**decade**.

What this means is that it's hard for *any* file system to be that
well tested; it's hard to substitute for years and years of production
use, hopefully in systems that have very rigorous monitoring so you
would notice if data or file system metadata is getting corrupted in
ways that can't be explained as hardware errors.  The fact that we
found a bug that was never discovered in ext3 after years and years of
use in many enterprises is a testimony to that fact.

(This is also why the fact that Facebook has started using btrfs in
production is going to be a very good thing for btrfs.  I'm sure they
will find all sorts of problems once they start running at large
scale, which is a _good_ thing; that's how those problems get fixed.)

Of course, using xfstests certainly helps a lot, and so in my opinion
all serious file system developers should be regularly using xfstests
as a part of the daily development cycle, and to be be extremely
ruthless about not allowing any test regressions.

Best regards,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/