linux-ext4 - Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87txtfy4o2.fsf@spindle.srvr.nix>
Date:	Sat, 27 Oct 2012 22:40:45 +0100
From:	Nix <nix@...eri.org.uk>
To:	Eric Sandeen <sandeen@...hat.com>
Cc:	"Theodore Ts'o" <tytso@....edu>, linux-ext4@...r.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

On 27 Oct 2012, Eric Sandeen stated:

> On 10/27/12 4:29 PM, Nix wrote:
>> (But, seriously, fsstress is a wonderful thing. And the kernel's test
>> culture *is* improving, and I'm happy to see filesystem hackers in the
>> front line.)
>
> I've been testing with a hacked up devicemapper target which creates
> a "dirty" snapshot which requires a replay; saves the actual power
> drop & restore cycle, and I could repro the journal_checksum bug
> right off.

I'm just not sure why a umount -l of an unused-but-mounted dirty
filesystem followed immediately by a reboot() is triggering a journal
replay at all.

If the umount has started, it should complete before the reboot and mark
the fs clean and !needs_recovery, no matter how much dirty data it has
to write -- all my testing in virtualization does just that -- but it
clearly isn't working that way on real hardware (or, if it is, something
is vaping the controller's cache after the umount has finished, which is
pretty disturbing: nothing but simultaneous failure of two or more
drives or the battery should be able to vape that cache before it is
flushed, certainly not anything as simple as a device disconnection /
reboot).

> XFS has an ioctl to make this easy in regression testing, and several
> tests in xfstests do cover xfs journal recovery.  We need
> to add such a thing to ext4.  Not being able to programatically 
> test recovery is a problem.

True enough.

You can rest assured that I will continue being a test load if necessary --
though for now I have removed journal_async_commit from my mount options,
at least until this bug is fixed, because I don't like being a test load
*that* much!

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html