[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6601abe90909231550g5b55f277l218560c827693322@mail.gmail.com>
Date: Wed, 23 Sep 2009 15:50:53 -0700
From: Curt Wohlgemuth <curtw@...gle.com>
To: ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: ext4 inode corruption
Sorry to reply to self, but I'm now pretty sure that I understand this
problem. (Of course this insight came mere hours after I sent this
email -- and not in the previous 4 days of staring at it.)
It's likely the same issue fixed by
commit 1b774f669b4b02f4d2abf2792362ab72a2e124ab
ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()
In the previous case, in no-journal mode an about-to-be-freed metadata
block is marked dirty and available for writeback. The block is then
marked free, and re-used as a data block for a different inode; the
writeback takes place, corrupting the data block.
In this case, the newly-freed block is re-used as a *metadata* block
for a different inode. Hence the same pattern we were seeing before:
eh_entries = 0, eh_max = 340.
These inodes were left on systems from kernels without the above
patch. Accessing the files on *patched* kernels will still make the
BUG fire, hence the confusion.
Thanks,
Curt
On Wed, Sep 23, 2009 at 9:27 AM, Curt Wohlgemuth <curtw@...gle.com> wrote:
> We've been seeing sporadic inode corruption on our ext4 partitions which
> we've been trying to analyze, without much success. I'm wondering if
> anybody might have some clues as to where things might be going wrong.
>
> We find out about the corruption via a BUG firing in ext4_ext_get_blocks():
>
> /*
> * consistent leaf must not be empty;
> * this situation is possible, though, _during_ tree modification;
> * this is why assert can't be put in ext4_ext_find_extent()
> */
> BUG_ON(path[depth].p_ext == NULL && depth != 0);
>
> Of course, this fires long after the inode in question is corrupted. With
> some diagnostics added in front of this bug, we can find the inodes; they
> all have characteristics like this:
>
> Output from debugfs' stat command:
>
> Inode: 1195575 Type: regular Mode: 0600 Flags: 0x80000
> Generation: 2821101782 Version: 0x00000001
> User: 35800 Group: 5000 Size: 8400896
> File ACL: 0 Directory ACL: 0
> Links: 1 Blockcount: 8
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009
> atime: 0x4a9f7ff7 -- Thu Sep 3 01:36:07 2009
> mtime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009
> EXTENTS:
>
> Note that no data blocks are printed out here.
>
> Following the actual extent tree, it always looks like this:
>
> in-inode extent header:
> eh_magic: 0xf30a
> eh_entries: 1
> eh_max: 4
> eh_depth: 1
>
> in-inode extent index 0:
> ei_block: 0
> ei_leaf_lo: 36738577
> ei_leaf_hi: 0
>
> leaf node header (at block 36738577):
> eh_magic: 0xf30a
> eh_entries: 0
> eh_max: 340
> eh_depth: 0
>
> The i_size value of the inode will vary, from 8192 to 8400896. But the
> i_blocks value is *always* 8.
>
> The extent tree always has depth of 1 in the in-inode header, and a valid
> leaf node header; but the leaf node header always has 0 entries. This is
> what's causing the BUG above to fire.
>
> We believe the general pattern of user space calls to create these files is
> something like this:
>
> open(O_DIRECT)
> fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 8400896)
> < various writes to the file >
> fallocate(fd, 0, 0, actual_size + BLOCK_SIZE)
> ftruncate(fd, actual_size)
>
> The second fallocate() call without KEEP_SIZE allows the following
> ftruncate to actually truncate the file -- a known issue recently fixed by
> Jiaying Zhang (but her fix is not in our kernel yet). "actual_size" can be
> 0 at times.
>
> I can't think of any actions that would cause the i_size to be so large, yet
> the i_blocks always be 8. Looking at the code in
>
> ext4_ext_remove_space()
> ext4_ext_rm_leaf()
> ext4_ext_rm_idx()
>
> I don't see a way for the extent tree to take the shape above. There are no
> errors that I can see around the time the corrupted inodes are created. It
> *seems* as though the corruption is coming during truncation, but all our
> efforts to reproduce this with small test cases have so far failed.
>
> We're using a 2.6.26 code base, with most of the latest ext4 patches
> applied.
>
> Any insights/ruminations/guesses as to what might be happening are welcome.
>
> Thanks,
> Curt
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists