linux-ext4 - Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <508AF639.30603@onlinehome.de>
Date:	Fri, 26 Oct 2012 22:44:41 +0200
From:	Martin <marogge@...inehome.de>
To:	Nix <nix@...eri.org.uk>
CC:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-ext4@...r.kernel.org, tytso@....edu, stable@...r.kernel.org,
	gregkh@...uxfoundation.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3
 (and other stable branches?)

On 10/26/2012 10:24 PM, Nix wrote:
> On 26 Oct 2012, Martin spake thusly:
[...]
>> I have studied my corruption problem more closely and can give you a
>> description of what happened below. Would you say this may be the same
>> bug?
>
> No. You want to keep up with the thread. Ted's first educated guess is
> not always guaranteed to be correct (though this is rare).

OK

>
>> Oct 15 19:56:12
>>
>> Computer is booted again in order to copy a few files to memory stick. Unbeknownst to me, the following entries are logged in the
>> system log:
>>
>> Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5): add_dirent_to_buf:1587: inode #655361: block 2629945: comm mount: bad
>> entry in directory: rec_len % 4 != 0 - offset=360(360), inode=655682, rec_len=18, name_len=5
>> Oct 15 20:00:16 harold kernel: Aborting journal on device sda5-8.
>> Oct 15 20:00:16 harold kernel: EXT4-fs (sda5): Remounting filesystem read-only
>> Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in ext4_evict_inode:238: Journal has aborted
>> Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in ext4_create:2120: IO failure
>
> That's an interesting failure, but looks slightly different to what I
> saw. No bad directory entries, no aborted journals: a replayed journal
> and subsequent corruption. Still damaged though, and after a journal
> abort I'm not surprised you had problems!

So my corrupt journal is simply the result of a user turning off the 
machine at a bad point in time? That's scary. In that scenario even the 
option data=journal wouldn't save me from harm, would it?

Funny this happens to someone who has always said that robustness is the 
most important quality of a filesystem (and who thinks data=writeback is 
madness).

>
>>                            I will try to rename them to their
>> proper name on another machine, and restore them on the target
>> machine. However, due to the sheer number this might take forever.
>
> I relearned this week that backups are good.

Backups are good, and always too old.

>
>> Also I am worried the problem might re-surface, as it has neither been
>> identified nor fixed.
>
> I'm seeing it on almost every reboot.

Indeed the symptoms look different.

>
>> NB: kernel was v3.5.5
>
> Hm, this provides possible evidence that the problem does indeed extend
> into 3.5.x.
>
>> with CK1 and BFQ patches, tainted by nvidia module.
>
> It's hard to reason about a kernel that's had *that* massive lump of
> binary junk applied to it, alas. This may or may not be the same
> problem: it has some common features with what I see, but not all.
>

true, i normally re-create problems with vanilla kernels before 
reporting them. In this case I was cleanly sniped with no chance of 
re-play so far.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html