[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080306090029.GA6215@elte.hu>
Date:	Thu, 6 Mar 2008 10:00:29 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Jens Axboe <jens.axboe@...cle.com>
Subject: Re: Linux 2.6.25-rc4
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> In particular, the block layer changes should hopefully have sorted 
> themselves out, and CD burning etc hopefully works for people again. 
hm, tonight's randconfig bootrun produced a failing (soft-hung) kernel 
after about 120 iterations - and the log i captured _seems_ to indicate 
some block IO (or libata) completion weirdness.
unfortunately, it's not readily reproducible, and i triggered it with 
about 100 sched.git and 300 x86.git patches applied. BUT, virtually the 
same 100+300 patches queue produced a successful 1000+ randconfig 
testrun over the last weekend so i'm reasonably sure the regression is 
new and came in via upstream. Also, the config is UP (and it's a rather 
simple config in other aspects as well), so this must be something 
rather fundamental, not an SMP race.
I just spent about an hour trying to figure out a pattern but the bug 
just doesnt reproduce after 20 bootup attempts with the same bzImage. 
When it hung then it hung for hours, so the condition is permanent.
I've attached the bootup log which includes the SysRq-T output and the 
config. The hang seems to occur because an rc.sysinit task is not coming 
back from io_schedule():
rc.sysinit    D f75bcc24     0  1922   1893
       f761c810 00000086 f75bcd38 f75bcc24 1954bff5 00000015 f7746000 f761c974 
       f761c974 f7c17698 c180e7a8 f7747cc4 00000000 f7747ccc c180e7a8 c097bff7 
       c01a3acb c097c27d c01a3aa0 f7872a90 00000002 c01a3aa0 f7747e48 c097c2fc 
Call Trace:
 [<c097bff7>] io_schedule+0x37/0x70
 [<c01a3acb>] sync_buffer+0x2b/0x30
 [<c097c27d>] __wait_on_bit+0x4d/0x80
 [<c01a3aa0>] sync_buffer+0x0/0x30
 [<c01a3aa0>] sync_buffer+0x0/0x30
 [<c097c2fc>] out_of_line_wait_on_bit+0x4c/0x60
 [<c0142340>] wake_bit_function+0x0/0x40
 [<c01a3a51>] __wait_on_buffer+0x21/0x30
 [<c0209915>] ext3_bread+0x55/0x70
 [<c020cff8>] ext3_find_entry+0x258/0x660
 [<c03a0026>] avc_has_perm+0x46/0x50
 [<c03a0d14>] inode_has_perm+0x44/0x80
 [<c020de69>] ext3_lookup+0x29/0xa0
 [<c0189f90>] do_lookup+0x130/0x180
 [<c018b540>] __link_path_walk+0x340/0xd50
 [<c03a0d14>] inode_has_perm+0x44/0x80
 [<c018bf8a>] link_path_walk+0x3a/0xa0
 [<c016feb4>] __do_fault+0x1a4/0x3d0
 [<c018c1b7>] do_path_lookup+0x77/0x210
 [<c018cb57>] __user_walk_fd+0x27/0x40
 [<c01860d5>] vfs_stat_fd+0x15/0x40
 [<c016feb4>] __do_fault+0x1a4/0x3d0
 [<c01861ef>] sys_stat64+0xf/0x30
 [<c0125a5d>] do_page_fault+0x2ad/0x670
 [<c03db6cc>] trace_hardirqs_on_thunk+0xc/0x10
 [<c0115a5f>] sysenter_past_esp+0x5f/0x90
 =======================
So the last known-good kernel would be last Friday's -git:
  commit d395991c117d43bfca97101a931a41d062a93852
  Merge: b73384f... b445c56...
  Author: Linus Torvalds <torvalds@...dy.linux-foundation.org>
  Date:   Fri Feb 29 16:54:33 2008 -0800
but ... "git-log d395991c117d4.. block/" does not show anything 
particularly exciting. Note that the IO scheduler in question is:
  CONFIG_DEFAULT_IOSCHED="anticipatory"
so it's not the usual CFQ - that's due to randconfig.
	Ingo
View attachment "hang.log" of type "text/plain" (234096 bytes)
View attachment "config.hang" of type "text/plain" (48314 bytes)
Powered by blists - more mailing lists
 
