linux-kernel - Re: 2.6.18 ext3 panic.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1160589284.1447.19.camel@dyn9047017100.beaverton.ibm.com>
Date:	Wed, 11 Oct 2006 10:54:44 -0700
From:	Badari Pulavarty <pbadari@...ibm.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Eric Sandeen <sandeen@...deen.net>,
	Eric Sandeen <esandeen@...hat.com>,
	Dave Jones <davej@...hat.com>, Andrew Morton <akpm@...l.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.18 ext3 panic.

On Wed, 2006-10-11 at 16:22 +0200, Jan Kara wrote:
> > Jan Kara wrote:
> > 
> > >  Umm, but these two traces confuse me:
> > >1) They are different traces that those you wrote about initially,
> > >aren't they? Because here we would not call sync_dirty_buffer() from
> > >journal_dirty_data().
> > >  BTW: Does this buffer trace lead to that Oops in submit_bh()? I guess not
> > >as the buffer is not dirty...
> > 
> > They do wind up at the same oops, from the same "testcase" (i.e. beat the 
> > tar out of the filesystem with multiple fsx's and fsstress...)
> > 
> > The buffer is not dirty at that tracepoint because it has just done
> >                 if (locked && test_clear_buffer_dirty(bh)) {
> > prior to the tracepoint...
>   Oh, I see. OK.
> 
> > 
> > See the whole traces at
> > 
> > http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt
> > http://people.redhat.com/esandeen/traces/eric_ext3_oops2.txt
>   Hmm, those traces look really useful. I just have to digest them ;).
> 
> > As an aside, when we do journal_unmap_buffer... should it stay on 
> > t_sync_datalist?
>   Yes, it should and it seems it really was removed from it at some
> point. Only later journal_dirty_data() came and filed it back to the
> BJ_SyncData list. And the buffer remained unmapped till the commit time
> and then *bang*... It may even be a race in ext3 itself that it called
> journal_dirty_data() on an unmapped buffer but I have to read some more
> code.
> 

Yes. calling journal_dirty_data() on unmapped buffer can definitely
happen. (only thing i am not sure is - why doesn't happen with a
simple testcase like dirtying only a part of a page in 1k filesystem.
I am not sure why we need journal_unmap_buffer() in the sequence).

Here is what I think is happening..

journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but
its a part of the same page. So it remained on the page->buffers
list. (at this time its not part of any transaction).

Then, ordererd_commit_write() called journal_dirty_data() and we added
all these buffers to BJ_SyncData list. (at this time buffer is clean -
not dirty).

Now msync() called __set_page_dirty_buffers() and dirtied *all* the
buffers attached to this page.

journal_submit_data_buffers() got around to this buffer and tried to
submit the buffer...

Andrew is right - only option for us to check the filesize in the
write out path and skip the buffers beyond EOF.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/