linux-ext4 - RE: ext4 corruption during unexpected power cycle in the middle of writing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2CE44BD3DBCF9541909CCB42F11CA392825D27@SFO1EXC-MBXP06.nbttech.com>
Date:	Wed, 6 Jun 2012 05:44:47 +0000
From:	Ming Lei <Ming.Lei@...erbed.com>
To:	Eric Sandeen <sandeen@...hat.com>
CC:	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: RE: ext4 corruption during unexpected power cycle in the middle of
 writing

Is this behavior documented somewhere?

-----Original Message-----
From: Eric Sandeen [mailto:sandeen@...hat.com] 
Sent: Tuesday, June 05, 2012 10:32 PM
To: Ming Lei
Cc: linux-ext4@...r.kernel.org
Subject: Re: ext4 corruption during unexpected power cycle in the middle of writing

On 6/6/12 12:24 AM, Ming Lei wrote:
> I ran the power cycle test during the middle of file writing and after bootup, I ran force fsck and found two errors (If I run fsck -p -v I don't see the errors). From what I saw I think it is file system meta data corruption. Fsck can repair it but each time I ran the same test and I hit the same issue. 
> 
> I don't think it is relevant but want to point out that sda6 shares the same drive as another partition on sda(sda3) is used for the raid6 array for /var.
> 
> The same issue was found whenever barrier is on or off, and the disk drive write cache is enabled or disabled. The test result shown below is when barrier is on and disk write cache is disabled. 
> 
> I use kernel version 2.6.32SL6 version. I also see the same issue on 2.6.9 based kernel on the same hardware with ext3 file system.
> 
> My question is:
> 1) Is the issue caused from something unique in my box? Configuration error?
> 2) Is it possible my version of fsck reported false errors?

Sort of.  You got:

> Free blocks count wrong (118366120, counted=76269471).
> Fix? yes
> 
> Free inodes count wrong (30081013, counted=30081004).
> Fix? yes

Those are the superblock counters, which aren't journaled - only the bg counters are logged via the journal, IIRC.

They aren't false... they are just expected due to the design I'm afraid.

If you had mounted/unmounted/fsck'd you wouldn't have seen errors, because at mount time the superblock gets updated from all of the individual bg counters in ext4_fill_super:

        /*
         * The journal may have updated the bg summary counts, so we
         * need to update the global counters.
         */

> 3) Is this a known issue? ? Is it a kernel bug?

yes.  Not really.  ;)

> 4) How do I find what's wrong?

I think this is by design, though maybe a little unfortunate in that it is unexpected to get fsck errors on a journaling filesystem after a crash...

I ran into this same thing when doing recovery testing for > 16T filesystems.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html