[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <6601abe90909081121p17b154a4s2e6852da2b71951f@mail.gmail.com>
Date: Tue, 8 Sep 2009 11:21:11 -0700
From: Curt Wohlgemuth <curtw@...gle.com>
To: Valerie Aurora <vaurora@...hat.com>
Cc: ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Odd "leak" of extent info into data blocks?
Hi Valerie:
On Tue, Sep 8, 2009 at 10:56 AM, Valerie Aurora<vaurora@...hat.com> wrote:
> Hey, did you figure this out? If not, I want to have a bug open
> somewhere.
Yes, sorry. I was going to post a patch for this, but have been
waiting to verify that it really fixes the issue. And see the thread
started by Frank Mayhar about fsync issues as well...
The problem is a race, between the last write to a to-be-freed
metadata block (to update the extent header) and the block being
marked free in the on-disk/buddy bitmaps. Note that this only happens
without a journal, since *with* a journal the ordering is done
correctly.
Without a journal, the block buffer_head is written to, the
buffer_head is marked dirty, and the bitmaps are updated via
ext4_free_blocks(). In rare cases, the block is re-allocated for
another inode and written to -- subsequently, the writeback mechanism
will then flush the dirty extent header back to disk. That's why it
looks like "leaked extent data" in the data block.
I'm discussing with Frank whether we should handle this in
ext4_handle_dirty_metadata(), as per Ted's suggestion, or in separate
one-off patches, or what.
Thanks,
Curt
>
> Thanks,
>
> -VAL
>
> On Sat, Aug 22, 2009 at 04:10:56PM -0700, Curt Wohlgemuth wrote:
>> On the off chance that this sounds familiar to anyone out there...
>>
>> I've got a situation in which data files written by an application are
>> showing very occasional checksum errors sometimes. The data files are
>> all around 8MB long, written using O_DIRECT into fallocated space.
>> (The entire fallocated space for the example file below is written to
>> with valid data; i.e., no holes, no truncation, no uninitialized
>> extents.)
>>
>> When these occasional checksum failures show up, the data in the files
>> is rather odd. I've seen 4 cases of this so far, and the "bad" data
>> always starts on a block boundary, and always has the first 12 bytes
>> that are identical to what an extent header would look like (for a
>> header at the start of a block of extents or extent indexes):
>>
>> Here's the "od -Ad -x" output from one such file:
>>
>> 8388608 f30a 0000 0154 0000 0000 0000 0000 0000
>>
>> (I.e., the first 2 bytes are EXT4_EXT_MAGIC, and bytes 4-5 are 0x154,
>> or what eh_max would be for a block size of 4096 bytes.)
>>
>> In this case, the "bad" data starts at block 2048. Two cases have
>> this pattern at block 2048; two at block 2050. A syscall trace of one
>> such corrupted file shows that this block was written with a single
>> write encompassing many adjacent blocks:
>>
>> write(fd=10, size=192512, offset=8204288)
>>
>> The file in question above has only two (in-inode) extents, which I
>> verified look valid. The block in question (2048) above is covered by
>> the second extent: logical blocks 2037-2050.
>>
>> I've seen the amount of "bad" data (including the "extent header"
>> above) to be pretty variable: between 70 and 800 bytes; I haven't been
>> able to correlate the rest of the bad data to any particular ext4 data
>> structures.
>>
>> My guess is that a block of extents from a truncated or removed file
>> was reused for data for this file, and somehow was not written
>> correctly. This seems (slightly) more plausible to me than the extent
>> metadata of an existing file was "leaked" into this one.
>>
>> Does any of this ring a bell to anybody?
>>
>> Thanks,
>> Curt
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists