linux-kernel - Re: Nature of ext4 corruption fixed by recent patch?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150519163739.GC2598@x>
Date:	Tue, 19 May 2015 09:37:40 -0700
From:	Josh Triplett <josh@...htriplett.org>
To:	Theodore Ts'o <tytso@....edu>, Lukas Czerner <lczerner@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Nature of ext4 corruption fixed by recent patch?

On Tue, May 19, 2015 at 09:40:05AM -0400, Theodore Ts'o wrote:
> On Mon, May 18, 2015 at 03:58:24PM -0700, josh@...htriplett.org wrote:
> > 
> > I recently had my server's filesystem implode, and I'm currently in the
> > process of cleaning it up.  It had widespread corruption in files and
> > directories scattered across the filesystem, though all vaguely recently
> > changed.  Directories appeared corrupted or truncated, various files
> > showed up as piles of NULs, and 5000+ files and directories ended up in
> > lost+found.  I observed this corruption shortly after a reboot into
> > 4.0.2 (from a previous kernel of 3.16), with ext4 noticing an
> > inconsistency and mounting the filesystem read-only.  The underling
> > disks had no errors.
> > 
> > Reading about the corruption issue fixed by
> > d2dc317d564a46dfc683978a2e5a4f91434e9711 ("ext4: fix data corruption
> > caused by unwritten and delayed extents"), it sounds plausible.  Can
> > that strike both file data and directory data, assuming all of that data
> > ended up grouped with a delayed extent?  Would that bug manifest as
> > corrupted directories and files filled with NULs?  The system is a
> > 72-way server on which I was doing piles of parallel git pulls and
> > builds, so hitting a race seems plausible.
> 
> Unfortunately, I don't think you can blame all of your problems on the
> bug fixed by this particular bug.  First of all, it doesn't apply to
> directories at all; secondly, it's been around for a long time.  I'd
> have to check and see whether or not 3.16 had the problem, but it
> wouldn't surprise me at all.  Finally, git pulls and builds are not
> at all likely to hit the problem.
>
> It requires the combination of (a) writing to a portion of a file that
> was not previously allocated using buffered I/O, (b) an fallocate of a
> region of the file which is a superset of region written in (a) before
> it has chance to be written to disk, (c) waiting for the file data in
> (a) to be written out to disk (either via fsync or via the writeback
> daemons), and then (d) before the extent status cache gets pushed out
> of memory, another random write to a portion of the file covered by
> (a) -- in which case that specific portion of (a) could be replaced by
> all zeros.
> 
> Even most database or torrent downloads are not likely to hit this
> pattern, since it requires an fallocate of a previous previously (and
> very recently) allocated region of a file using a buffered write.
> Torrent downloads will tend to fallocate the whole file in advance,
> and while Oracle or DB2 might intermix writes and fallocates, they
> don't fallocate previously written regions of the file, and they use
> direct I/O in any case.

Ah, thanks for the clarification. :(

In particular, I didn't realize this was *only* the data of the
delayed-extent-based files.  The bug here seems to have struck various
recently-written files and directories.  (Recent in days, not seconds,
as far as I can tell; and it isn't universal based on age.) The initial
symptom was ext4 noticing that a directory was corrupt (truncated, IIRC)
and immediately marking the whole filesystem read-only.

> So it's pretty hard to hit this bug by accident, unless you happen to
> be using fsx, and even then, the only files that would get corrupted
> would be the files being written using fsx.  So I'm afraid you'll have
> to look farther afield, and consider other bugs as well as potential
> hardware problems before trusting the system again.

I'm quite skeptical of hardware problems.  The system is a few months
old, well past infant-mortality and too young for burnout.  And I've
tested the disks carefully.

Are there any other known bugs that seem likely to fit the symptoms and
circumstances?

Note that since I saw this after rebooting from 3.16 into 4.0.2, I don't
know whether the corruption was more likely caused by 3.16 or 4.0.2.

> P.S.  It's bugs like these which is why I'm always amused by people
> who think that just because a file system is safely being used by
> their developers, that it's safe to throw production workloads on
> them.

Heh.  Yeah, I like exciting new software in most areas, but not in
filesystems.  In filesystems I prefer boring. :)

> These sorts of subtle data corruptors tend to be highly timing
> depend, and very hard to find.  Sometimes these bugs can hang around
> for years before they are found and fixed.  The flip side is that
> fortunately, they tend to strike very rarely.

...lucky me.

> It's also why I'm very
> grateful for developers like Jan and Lukas.  :-)

Indeed.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/