linux-kernel - Re: 2.6.18 ext3 panic.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <452DC5C5.3040507@comcast.net>
Date:	Wed, 11 Oct 2006 21:34:13 -0700
From:	John Wendel <jwendel10@...cast.net>
To:	Eric Sandeen <esandeen@...hat.com>
CC:	Badari Pulavarty <pbadari@...ibm.com>, Jan Kara <jack@...e.cz>,
	Eric Sandeen <sandeen@...deen.net>,
	Dave Jones <davej@...hat.com>, Andrew Morton <akpm@...l.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.18 ext3 panic.

Eric Sandeen wrote:
> Badari Pulavarty wrote:
>
>> Here is what I think is happening..
>>
>> journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but
>> its a part of the same page. So it remained on the page->buffers
>> list. (at this time its not part of any transaction).
>>
>> Then, ordererd_commit_write() called journal_dirty_data() and we added
>> all these buffers to BJ_SyncData list. (at this time buffer is clean -
>> not dirty).
>>
>> Now msync() called __set_page_dirty_buffers() and dirtied *all* the
>> buffers attached to this page.
>>
>> journal_submit_data_buffers() got around to this buffer and tried to
>> submit the buffer...
>
> This seems about right, but one thing bothers me in the traces; it 
> seems like there is some locking that is missing.  In
> http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt
> for example, it looks like journal_dirty_data gets started, but then 
> the buffer_head is acted on by journal_unmap_buffer, which decides 
> this buffer is part of the running transaction, past EOF, and clears 
> mapped, dirty, etc.  Then journal_dirty_data picks up again, decides 
> that the buffer is not on the right list (now BJ_None) and puts it 
> back on BJ_SyncData.  Then it gets picked up by 
> journal_submit_data_buffers and submitted, and oops.
>
> Talking with Stephen, it seemed like the page lock should synchronize 
> these threads, but I've found that we can get to journal_dirty_data 
> acting on the buffer heads w/o having the page locked...
>
> I'm still digging, and, er, grasping at straws here... Am I off base?
>
> -Eric
>
>
>> Andrew is right - only option for us to check the filesize in the
>> write out path and skip the buffers beyond EOF.
>>
>> Thanks,
>> Badari
>>
Here's another data point for your consideration. I've been seeing this 
error since I started running 2.6.18, I assumed it was hardware, so I've 
tried 3 different disks, a PATA and 2 SATA drives, with VIA and Promise 
controllers, the error has occurred on all of them. I see the error 
infrequently, always when downloading lots of small files from Usenet 
and building, copying and deleting large (200 - 300 MB). I haven't ever 
had an oops/panic, just this error.  When I run fsck, I always see a 
single message that "deleted inode nnn has zero dtime". I hope this will 
be useful.

Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): 
ext3_free_blocks_sb: bit already cleared for block 4740550
Oct 11 20:37:32 Godzilla kernel: Aborting journal on device hda5.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_free_blocks_sb: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_free_blocks_sb: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_truncate: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_orphan_del: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_reserve_inode_write: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in 
ext3_delete_inode: Journal has aborted
Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing 
b_committed_data
Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing 
b_committed_data
Oct 11 20:37:32 Godzilla kernel: ext3_abort called.
Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): 
ext3_journal_start_sb: Detected aborted journal
Oct 11 20:37:32 Godzilla kernel: Remounting filesystem read-only

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/