linux-ext4 - Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121024210819.GA5484@thunk.org>
Date:	Wed, 24 Oct 2012 17:08:19 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Nix <nix@...eri.org.uk>
Cc:	Eric Sandeen <sandeen@...hat.com>, linux-ext4@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Bryan Schumaker <bjschuma@...app.com>,
	Peng Tao <bergwolf@...il.com>, Trond.Myklebust@...app.com,
	gregkh@...uxfoundation.org,
	Toralf Förster <toralf.foerster@....de>
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3
 (and other stable branches?)

On Wed, Oct 24, 2012 at 09:45:47PM +0100, Nix wrote:
> 
> It occurs to me that it is possible that this bug hits only those
> filesystems for which a umount has started but been unable to complete.
> If so, this is a relatively rare and unimportant bug which probably hits
> only me and users of slow removable filesystems in the whole world...

Can you verify this?  Does the bug show up if you just hit the power
switch while the system is booted?

How about changing the "sleep 2" to "sleep 0.5"?  (Feel free to
unmount your other partitions, and just leave a test file system
mounted to minimize the chances that you lose partitions that require
hours and hours to restore...)

If you can get a very reliable repro, we might have to ask you to try
the following experiments:

0) Make sure the reliable repro does _not_ work with 3.6.1 booted

1) Try a 3.6.2 kernel

2) (If the problem shows up above) try a 3.6.2 kernel with 14b4ed2 reverted

3) (If the problem shows up above) try a 3.6.2 kernel with all of ext4
   related patches reverted:
92b7722 ext4: fix mtime update in nodelalloc mode
34414b2 ext4: fix fdatasync() for files with only i_size changes
12ebdf0 ext4: always set i_op in ext4_mknod()
22a5672 ext4: online defrag is not supported for journaled files
ba57d9e ext4: move_extent code cleanup
2fdb112 ext4: fix crash when accessing /proc/mounts concurrently
1638f1f ext4: fix potential deadlock in ext4_nonda_switch()
5018ddd ext4: avoid duplicate writes of the backup bg descriptor blocks
256ae46 ext4: don't copy non-existent gdt blocks when resizing
416a688 ext4: ignore last group w/o enough space when resizing instead of BUG'ing
14b4ed2 jbd2: don't write superblock when if its empty

4) (If the problem still shows up) then we may need to do a full
   bisect to figure out what is going on....

   	     	     	    	     	- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html