[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46020385.50301@yahoo.com.au>
Date: Thu, 22 Mar 2007 15:18:13 +1100
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Adrian Bunk <bunk@...sta.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Michal Piotrowski <michal.k.k.piotrowski@...il.com>,
Mariusz Kozlowski <m.kozlowski@...land.pl>,
Oliver Pinter <oliver.pntr@...il.com>,
Sid Boyce <g3vbv@...eyonder.co.uk>,
Nick Piggin <npiggin@...e.de>,
Jens Axboe <jens.axboe@...cle.com>
Subject: Re: [1/6] 2.6.21-rc4: known regressions
Linus Torvalds wrote:
> In contrast, the hang reported by Mariusz Kozlowski has a slightly
> different feel to it, but there's a tantalizing pattern in there too:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html
>
> Call Trace:
> [<c03ec87e>] io_schedule+0x42/0x59
> [<c0184915>] sleep_on_buffer+0x8/0xc
> [<c03ed217>] __wait_on_bit+0x47/0x6c
> [<c03ed297>] out_of_line_wait_on_bit+0x5b/0x64
> [<c01848a8>] __wait_on_buffer+0x27/0x2d
> [<c01b4228>] journal_commit_transaction+0x707/0x127f
> [<c01b868b>] kjournald+0xac/0x1ed
> [<c0126af5>] kthread+0xa2/0xc9
> [<c010422b>] kernel_thread_helper+0x7/0x1c
>
> which certainly also looks like an IO never completed (or completed but
> never woke anything up).
>
> It also seems to be related to *buffers*. Maybe the whole bh layer thing
> is a fluke, but it's not waiting for normal data, it's very much waiting
> for those journal things that all use buffer heads.Which just makes me
> worry about those patches by Nick (which did come in through Andrew). I
> don't think it's the memorder one (it looks safe and shouldn't matter on
> x86 anyway!), but what about the
>
> fs: fix __block_write_full_page error case buffer submission
>
> locking change for example? Or that "fs: fix nobh data leak" thing with
> its fix? It uses "SetPageUptodate(page);" without waking up anybody who
> might wait for it (but the waiters here seem to wait on buffers, so that's
> probably not it)..
Nothing sleeps on PageUptodate, so I don't think that could explain it.
The fs: fix __block_write_full_page error case buffer submission patch
does change the locking, but I'd be really suprised if that was the
problem, because it changes locking to match the regular non-error path
submission.
It could be possible that ext3 is doing something weird and expecting
the old behaviour if it failed get_block, but that seems pretty weird
to do, and would need fixing.
fs: nobh data leak... again hard to see how it could cause an unlock/wakeup
to get lost. Is Mariusz using the nobh mount option?
It wouldn't hurt to test with these patches backed out...
> Alternatively, maybe it really is an _io_ problem (and the buffer-head
> thing is just a red herring, and it could happen to other IO, it's just
> that metadata IO uses buffer heads), and it's the scheduler changes since
> 2.6.20..
I see what you mean. Could it be an ext3 or jbd change I wonder?
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists