linux-kernel - Re: [1/6] 2.6.21-rc4: known regressions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46020385.50301@yahoo.com.au>
Date:	Thu, 22 Mar 2007 15:18:13 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Adrian Bunk <bunk@...sta.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Michal Piotrowski <michal.k.k.piotrowski@...il.com>,
	Mariusz Kozlowski <m.kozlowski@...land.pl>,
	Oliver Pinter <oliver.pntr@...il.com>,
	Sid Boyce <g3vbv@...eyonder.co.uk>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>
Subject: Re: [1/6] 2.6.21-rc4: known regressions

Linus Torvalds wrote:

> In contrast, the hang reported by Mariusz Kozlowski has a slightly 
> different feel to it, but there's a tantalizing pattern in there too:
> 
>   http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/1243.html
> 
> 	Call Trace:
> 	[<c03ec87e>] io_schedule+0x42/0x59
> 	[<c0184915>] sleep_on_buffer+0x8/0xc
> 	[<c03ed217>] __wait_on_bit+0x47/0x6c
> 	[<c03ed297>] out_of_line_wait_on_bit+0x5b/0x64
> 	[<c01848a8>] __wait_on_buffer+0x27/0x2d
> 	[<c01b4228>] journal_commit_transaction+0x707/0x127f
> 	[<c01b868b>] kjournald+0xac/0x1ed
> 	[<c0126af5>] kthread+0xa2/0xc9
> 	[<c010422b>] kernel_thread_helper+0x7/0x1c
> 
> which certainly also looks like an IO never completed (or completed but 
> never woke anything up).
> 
> It also seems to be related to *buffers*. Maybe the whole bh layer thing 
> is a fluke, but it's not waiting for normal data, it's very much waiting 
> for those journal things that all use buffer heads.Which just makes me 
> worry about those patches by Nick (which did come in through Andrew). I 
> don't think it's the memorder one (it looks safe and shouldn't matter on 
> x86 anyway!), but what about the
> 
> 	fs: fix __block_write_full_page error case buffer submission
> 
> locking change for example? Or that "fs: fix nobh data leak" thing with 
> its fix? It uses "SetPageUptodate(page);" without waking up anybody who 
> might wait for it (but the waiters here seem to wait on buffers, so that's 
> probably not it)..

Nothing sleeps on PageUptodate, so I don't think that could explain it.

The fs: fix __block_write_full_page error case buffer submission patch
does change the locking, but I'd be really suprised if that was the
problem, because it changes locking to match the regular non-error path
submission.

It could be possible that ext3 is doing something weird and expecting
the old behaviour if it failed get_block, but that seems pretty weird
to do, and would need fixing.

fs: nobh data leak... again hard to see how it could cause an unlock/wakeup
to get lost. Is Mariusz using the nobh mount option?

It wouldn't hurt to test with these patches backed out...

> Alternatively, maybe it really is an _io_ problem (and the buffer-head 
> thing is just a red herring, and it could happen to other IO, it's just 
> that metadata IO uses buffer heads), and it's the scheduler changes since 
> 2.6.20..

I see what you mean. Could it be an ext3 or jbd change I wonder?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/