[<prev] [next>] [day] [month] [year] [list]
Message-ID: <A121ABA5B472B74EB59076B8E3C8F0190260FAB3@rtpe2k01.adaptec.com>
Date: Mon, 16 Oct 2006 09:40:03 -0400
From: "Hammer, Jack" <Jack_Hammer@...ptec.com>
To: "Nick Piggin" <nickpiggin@...oo.com.au>,
"Nishanth Aravamudan" <nacc@...ibm.com>
Cc: "LKML" <linux-kernel@...r.kernel.org>
Subject: RE: ips: scheduling while atomic in 2.6.18
The MDELAY/msleep changes are part of a critical bug fix, so if you
change them all back, you're re-introducing the bug. Without them, you
can cause a lock up ( caught by the 2.6 softlock watchdog ) during a
reset.
But you should still try it, and maybe we have to think of another
solution if this is causing your problem.
Let me know what happens if you change it back ...
Jack
-----Original Message-----
From: Nick Piggin [mailto:nickpiggin@...oo.com.au]
Sent: Sunday, October 15, 2006 11:20 AM
To: Nishanth Aravamudan
Cc: IpsLinux; LKML; Hammer, Jack
Subject: Re: ips: scheduling while atomic in 2.6.18
Nishanth Aravamudan wrote:
> Hi all,
>
> A server I administer just dumped three scheduling while atomics
> before (sort of) hanging hard. Still responds to ping, but ssh is now
> dead and the serial console stopped logging.
>
> 8-way PIII, 2.6.18 with the 3:1 split. Wanted to get my report out
> there before I reset the box, though.
Thanks for the report. The messages are caused by this commit (cc'ed
author):
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=c
ommit;h=15084a4a63bc300c18b28a8a9afac870c552abce
Not sure whether they are the cause of your hang, but the from the
changelog it doesn't look like the commit was strictly a bugfix so you
could try changing msleep calls in the driver back to MDELAY.
>
> ips 0000:0d:06.0: Resetting controller.
> BUG: scheduling while atomic: ipssend/0x00000001/11199 [<c0352e4d>]
> schedule+0x8ad/0x920 [<c03550ef>] _spin_unlock_irqrestore+0xf/0x30
> [<c011c443>] release_console_sem+0x203/0x220 [<c011caee>]
> vprintk+0x29e/0x380 [<c0126ab0>] lock_timer_base+0x20/0x50
> [<c03550ef>] _spin_unlock_irqrestore+0xf/0x30 [<c0126c0c>]
> __mod_timer+0x9c/0xc0 [<c03536c7>] schedule_timeout+0x57/0xd0
> [<c0125fb0>] process_timeout+0x0/0x10 [<c0126e58>] msleep+0x28/0x40
> [<c02ba521>] ips_reset_copperhead_memio+0x21/0x60
> [<c02b803c>] __ips_eh_reset+0x17c/0x380 [<c028f880>]
> scsi_done+0x0/0x30 [<c02bba2e>] ips_queue+0x17e/0x1b0 [<c028fdc1>]
> scsi_dispatch_cmd+0x161/0x260 [<c028f880>] scsi_done+0x0/0x30
> [<c0292a90>] scsi_times_out+0x0/0x80 [<c0294ea7>]
> scsi_request_fn+0x187/0x2f0 [<c0212b0e>]
> blk_execute_rq_nowait+0x6e/0xc0 [<c0294c11>]
> scsi_execute_async+0x2b1/0x3c0 [<c0294620>] scsi_end_async+0x0/0x60
> [<c02c7940>] sg_cmd_done+0x0/0x260 [<c02c7e28>]
> sg_common_write+0x288/0x700 [<c02c7940>] sg_cmd_done+0x0/0x260
> [<c02c9aec>] sg_write+0x21c/0x300 [<c0350000>]
> sunrpc_cache_lookup+0x140/0x150 [<c0177837>] do_ioctl+0x87/0x90
> [<c01638b5>] vfs_write+0xb5/0x190 [<c016408b>] sys_write+0x4b/0x80
> [<c010329b>] syscall_call+0x7/0xb
> BUG: scheduling while atomic: ipssend/0x00000001/11199 [<c0352e4d>]
> schedule+0x8ad/0x920 [<c03550ef>] _spin_unlock_irqrestore+0xf/0x30
> [<c011c443>] release_console_sem+0x203/0x220 [<c0126ab0>]
> lock_timer_base+0x20/0x50 [<c03550ef>]
> _spin_unlock_irqrestore+0xf/0x30 [<c0126c0c>] __mod_timer+0x9c/0xc0
> [<c03536c7>] schedule_timeout+0x57/0xd0 [<c0125fb0>]
> process_timeout+0x0/0x10 [<c0126e58>] msleep+0x28/0x40 [<c02ba537>]
> ips_reset_copperhead_memio+0x37/0x60
> [<c02b803c>] __ips_eh_reset+0x17c/0x380 [<c028f880>]
> scsi_done+0x0/0x30 [<c02bba2e>] ips_queue+0x17e/0x1b0 [<c028fdc1>]
> scsi_dispatch_cmd+0x161/0x260 [<c028f880>] scsi_done+0x0/0x30
> [<c0292a90>] scsi_times_out+0x0/0x80 [<c0294ea7>]
> scsi_request_fn+0x187/0x2f0 [<c0212b0e>]
> blk_execute_rq_nowait+0x6e/0xc0 [<c0294c11>]
> scsi_execute_async+0x2b1/0x3c0 [<c0294620>] scsi_end_async+0x0/0x60
> [<c02c7940>] sg_cmd_done+0x0/0x260 [<c02c7e28>]
> sg_common_write+0x288/0x700 [<c02c7940>] sg_cmd_done+0x0/0x260
> [<c02c9aec>] sg_write+0x21c/0x300 [<c0350000>]
> sunrpc_cache_lookup+0x140/0x150 [<c0177837>] do_ioctl+0x87/0x90
> [<c01638b5>] vfs_write+0xb5/0x190 [<c016408b>] sys_write+0x4b/0x80
> [<c010329b>] syscall_call+0x7/0xb
> BUG: scheduling while atomic: ipssend/0x00000001/11199 [<c0352e4d>]
> schedule+0x8ad/0x920 [<c0352947>] schedule+0x3a7/0x920 [<c0126ab0>]
> lock_timer_base+0x20/0x50 [<c03550ef>]
> _spin_unlock_irqrestore+0xf/0x30 [<c0126c0c>] __mod_timer+0x9c/0xc0
> [<c03536c7>] schedule_timeout+0x57/0xd0 [<c03550ef>]
> _spin_unlock_irqrestore+0xf/0x30 [<c0125fb0>]
> process_timeout+0x0/0x10 [<c0126e58>] msleep+0x28/0x40 [<c02ba5e0>]
> ips_init_copperhead_memio+0x20/0x150
> [<c0125fb0>] process_timeout+0x0/0x10 [<c02ba542>]
> ips_reset_copperhead_memio+0x42/0x60
> [<c02b803c>] __ips_eh_reset+0x17c/0x380 [<c028f880>]
> scsi_done+0x0/0x30 [<c02bba2e>] ips_queue+0x17e/0x1b0 [<c028fdc1>]
> scsi_dispatch_cmd+0x161/0x260 [<c028f880>] scsi_done+0x0/0x30
> [<c0292a90>] scsi_times_out+0x0/0x80 [<c0294ea7>]
> scsi_request_fn+0x187/0x2f0 [<c0212b0e>]
> blk_execute_rq_nowait+0x6e/0xc0 [<c0294c11>]
> scsi_execute_async+0x2b1/0x3c0 [<c0294620>] scsi_end_async+0x0/0x60
> [<c02c7940>] sg_cmd_done+0x0/0x260 [<c02c7e28>]
> sg_common_write+0x288/0x700 [<c02c7940>] sg_cmd_done+0x0/0x260
> [<c02c9aec>] sg_write+0x21c/0x300 [<c0350000>]
> sunrpc_cache_lookup+0x140/0x150 [<c0177837>] do_ioctl+0x87/0x90
> [<c01638b5>] vfs_write+0xb5/0x190 [<c016408b>] sys_write+0x4b/0x80
> [<c010329b>] syscall_call+0x7/0xb
>
> Thanks,
> Nish
>
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends
http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists