[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <20090924182749.GB10562@webber.adilger.int>
Date: Thu, 24 Sep 2009 12:27:49 -0600
From: Andreas Dilger <adilger@....com>
To: Curt Wohlgemuth <curtw@...gle.com>
Cc: ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: ext4 inode corruption
On Sep 23, 2009 15:50 -0700, Curt Wohlgemuth wrote:
> Sorry to reply to self, but I'm now pretty sure that I understand this
> problem. (Of course this insight came mere hours after I sent this
> email -- and not in the previous 4 days of staring at it.)
>
> It's likely the same issue fixed by
>
> commit 1b774f669b4b02f4d2abf2792362ab72a2e124ab
> ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()
I was going to say that this sounded like a familiar problem, but you
already did the leg (well, mouse) work.
> In the previous case, in no-journal mode an about-to-be-freed metadata
> block is marked dirty and available for writeback. The block is then
> marked free, and re-used as a data block for a different inode; the
> writeback takes place, corrupting the data block.
>
> In this case, the newly-freed block is re-used as a *metadata* block
> for a different inode. Hence the same pattern we were seeing before:
> eh_entries = 0, eh_max = 340.
>
> These inodes were left on systems from kernels without the above
> patch. Accessing the files on *patched* kernels will still make the
> BUG fire, hence the confusion.
>
> Thanks,
> Curt
>
>
> On Wed, Sep 23, 2009 at 9:27 AM, Curt Wohlgemuth <curtw@...gle.com> wrote:
> > We've been seeing sporadic inode corruption on our ext4 partitions which
> > we've been trying to analyze, without much success. I'm wondering if
> > anybody might have some clues as to where things might be going wrong.
> >
> > We find out about the corruption via a BUG firing in ext4_ext_get_blocks():
> >
> > /*
> > * consistent leaf must not be empty;
> > * this situation is possible, though, _during_ tree modification;
> > * this is why assert can't be put in ext4_ext_find_extent()
> > */
> > BUG_ON(path[depth].p_ext == NULL && depth != 0);
> >
> > Of course, this fires long after the inode in question is corrupted. With
> > some diagnostics added in front of this bug, we can find the inodes; they
> > all have characteristics like this:
> >
> > Output from debugfs' stat command:
> >
> > Inode: 1195575 Type: regular Mode: 0600 Flags: 0x80000
> > Generation: 2821101782 Version: 0x00000001
> > User: 35800 Group: 5000 Size: 8400896
> > File ACL: 0 Directory ACL: 0
> > Links: 1 Blockcount: 8
> > Fragment: Address: 0 Number: 0 Size: 0
> > ctime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009
> > atime: 0x4a9f7ff7 -- Thu Sep 3 01:36:07 2009
> > mtime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009
> > EXTENTS:
> >
> > Note that no data blocks are printed out here.
> >
> > Following the actual extent tree, it always looks like this:
> >
> > in-inode extent header:
> > eh_magic: 0xf30a
> > eh_entries: 1
> > eh_max: 4
> > eh_depth: 1
> >
> > in-inode extent index 0:
> > ei_block: 0
> > ei_leaf_lo: 36738577
> > ei_leaf_hi: 0
> >
> > leaf node header (at block 36738577):
> > eh_magic: 0xf30a
> > eh_entries: 0
> > eh_max: 340
> > eh_depth: 0
> >
> > The i_size value of the inode will vary, from 8192 to 8400896. But the
> > i_blocks value is *always* 8.
> >
> > The extent tree always has depth of 1 in the in-inode header, and a valid
> > leaf node header; but the leaf node header always has 0 entries. This is
> > what's causing the BUG above to fire.
> >
> > We believe the general pattern of user space calls to create these files is
> > something like this:
> >
> > open(O_DIRECT)
> > fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 8400896)
> > < various writes to the file >
> > fallocate(fd, 0, 0, actual_size + BLOCK_SIZE)
> > ftruncate(fd, actual_size)
> >
> > The second fallocate() call without KEEP_SIZE allows the following
> > ftruncate to actually truncate the file -- a known issue recently fixed by
> > Jiaying Zhang (but her fix is not in our kernel yet). "actual_size" can be
> > 0 at times.
> >
> > I can't think of any actions that would cause the i_size to be so large, yet
> > the i_blocks always be 8. Looking at the code in
> >
> > ext4_ext_remove_space()
> > ext4_ext_rm_leaf()
> > ext4_ext_rm_idx()
> >
> > I don't see a way for the extent tree to take the shape above. There are no
> > errors that I can see around the time the corrupted inodes are created. It
> > *seems* as though the corruption is coming during truncation, but all our
> > efforts to reproduce this with small test cases have so far failed.
> >
> > We're using a 2.6.26 code base, with most of the latest ext4 patches
> > applied.
> >
> > Any insights/ruminations/guesses as to what might be happening are welcome.
> >
> > Thanks,
> > Curt
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists