[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <452DC5C5.3040507@comcast.net>
Date: Wed, 11 Oct 2006 21:34:13 -0700
From: John Wendel <jwendel10@...cast.net>
To: Eric Sandeen <esandeen@...hat.com>
CC: Badari Pulavarty <pbadari@...ibm.com>, Jan Kara <jack@...e.cz>,
Eric Sandeen <sandeen@...deen.net>,
Dave Jones <davej@...hat.com>, Andrew Morton <akpm@...l.org>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.18 ext3 panic.
Eric Sandeen wrote:
> Badari Pulavarty wrote:
>
>> Here is what I think is happening..
>>
>> journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but
>> its a part of the same page. So it remained on the page->buffers
>> list. (at this time its not part of any transaction).
>>
>> Then, ordererd_commit_write() called journal_dirty_data() and we added
>> all these buffers to BJ_SyncData list. (at this time buffer is clean -
>> not dirty).
>>
>> Now msync() called __set_page_dirty_buffers() and dirtied *all* the
>> buffers attached to this page.
>>
>> journal_submit_data_buffers() got around to this buffer and tried to
>> submit the buffer...
>
> This seems about right, but one thing bothers me in the traces; it
> seems like there is some locking that is missing. In
> http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt
> for example, it looks like journal_dirty_data gets started, but then
> the buffer_head is acted on by journal_unmap_buffer, which decides
> this buffer is part of the running transaction, past EOF, and clears
> mapped, dirty, etc. Then journal_dirty_data picks up again, decides
> that the buffer is not on the right list (now BJ_None) and puts it
> back on BJ_SyncData. Then it gets picked up by
> journal_submit_data_buffers and submitted, and oops.
>
> Talking with Stephen, it seemed like the page lock should synchronize
> these threads, but I've found that we can get to journal_dirty_data
> acting on the buffer heads w/o having the page locked...
>
> I'm still digging, and, er, grasping at straws here... Am I off base?
>
> -Eric
>
>
>> Andrew is right - only option for us to check the filesize in the
>> write out path and skip the buffers beyond EOF.
>>
>> Thanks,
>> Badari
>>
Here's another data point for your consideration. I've been seeing this
error since I started running 2.6.18, I assumed it was hardware, so I've
tried 3 different disks, a PATA and 2 SATA drives, with VIA and Promise
controllers, the error has occurred on all of them. I see the error
infrequently, always when downloading lots of small files from Usenet
and building, copying and deleting large (200 - 300 MB). I haven't ever
had an oops/panic, just this error. When I run fsck, I always see a
single message that "deleted inode nnn has zero dtime". I hope this will
be useful.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5):
ext3_free_blocks_sb: bit already cleared for block 4740550
Oct 11 20:37:32 Godzilla kernel: Aborting journal on device hda5.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_free_blocks_sb: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_free_blocks_sb: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_truncate: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_orphan_del: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in
ext3_delete_inode: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing
b_committed_data
Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing
b_committed_data
Oct 11 20:37:32 Godzilla kernel: ext3_abort called.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5):
ext3_journal_start_sb: Detected aborted journal
Oct 11 20:37:32 Godzilla kernel: Remounting filesystem read-only
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists