[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <636295BFF4A001418A00F46569A2CD2B161CE88B@US-PLNO-EXM01-P.global.tektronix.net>
Date: Tue, 9 Jul 2013 17:42:29 +0000
From: "Rich, Jason" <jason.rich@...comms.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Panic at _blk_run_queue on 2.6.32
Greetings,
I've recently encountered an issue where multiple hosts are failing to boot up about 1/5 of the time. So far I have confirmed this
issue on three seperate host machines. The issue presents itself after updating 2.6.32.39 to patch 50 and patch 61.
Both patch levels result in the failure described below. Since this occurs on multiple hosts, I feel I can safely rule out hardware.
It is also of note that I have not seen this behavior on the 3.4.26 kernel, or on any of my 32bit hosts.
That said, I have to support this software release (which runs on the 2.6 kernel) for at least another two years.
I've looked through the list of open and closed issues on bugzilla and see nothing similar.
The console log of the crash is below, as well as the output of the crash dump (using crash tool).
Lsmod, lspci & kernel config attached.
I'm at a loss and consider myself a novice at debugging kernel issues. Any help is greatly appreciated.
Some details about the host:
1x Intel Xeon L5518 (quad core + HT)
32G DDR3
on board eUSB
This is an ATCA blade (irrelavent to the issue no doubt)
Lsmod, lspci & kernel config attached.
>From the console:
initramfs bootup: 2.6.32.61.TEK.V7.12.1.5024.p61 x86_64
<initramfs bootup...truncated as irrelevant to the issue at hand>
BOOT_IMAGE=/boot/bzImage-2.6.32.61.TEK.V7.12.1.5024.p61 -> /boot/bzImage-2.6.32.61.TEK.V7.12.1.5024.p61
<initramfs bootup...truncated as irrelevant to the issue at hand>
Setting kernel variables ... /etc/sysctl.conf...done.
Setting up X server socket directory /tmp/.X11-unix....
Setting up ICE socket directory /tmp/.ICE-unix....
Starting portmap daemon....
[ 30.757040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 30.765242] IP: [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[ 30.771296] PGD 0
[ 30.773408] Oops: 0000 [#1] PREEMPT SMP
[ 30.777525] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb1/1-3/1-3:1.0/host9/target9:0:0/9:0:0:0/scsi_device/9:0:0:0/uevent
[ 30.790203] CPU 0
[ 30.792253] Modules linked in: mptctl ipmi_poweroff igb ixgbe usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas edd i2c_dev ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: usb_storage]
[ 30.812346] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.32.61.TEK.V7.12.1.5024.p61 #1 ATCA-4500
[ 30.821280] RIP: 0010:[<ffffffff811c1eeb>] [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[ 30.829911] RSP: 0018:ffff880028203d28 EFLAGS: 00010046
[ 30.835445] RAX: 0000000000000000 RBX: ffff88083cf7ec98 RCX: ffff88083cf7ec98
[ 30.842885] RDX: ffff88083d3a88c0 RSI: 0000000000000292 RDI: ffff88083cf7ec98
[ 30.850226] RBP: ffff880028203d28 R08: ffff880028203d68 R09: 0000000000ade46b
[ 30.857708] R10: ffff88083ced5050 R11: ffff880028203d68 R12: 0000000000000292
[ 30.865182] R13: ffff880028203d98 R14: ffff88083ced5050 R15: 0000000000000000
[ 30.872594] FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
[ 30.881111] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 30.887087] CR2: 0000000000000040 CR3: 000000083db0a000 CR4: 00000000000006f0
[ 30.894484] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 30.901905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 30.909376] Process ksoftirqd/0 (pid: 4, threadinfo ffff88083f8a6000, task ffff88083f884080)
[ 30.918231] Stack:
[ 30.920368] ffff880028203d48 ffffffff811c434b ffff88083cf7ec98 ffff88083cf7ec98
[ 30.927955] <0> ffff880028203d68 ffffffff811c43be ffff88083ced5000 ffff88083cf7ec98
[ 30.936074] <0> ffff880028203dd8 ffffffff812a9040 ffff880028203d88 ffff880028203d98
[ 30.944283] Call Trace:
[ 30.946853] <IRQ>
[ 30.949053] [<ffffffff811c434b>] __blk_run_queue+0x22/0x74
[ 30.954855] [<ffffffff811c43be>] blk_run_queue+0x21/0x35
[ 30.960479] [<ffffffff812a9040>] scsi_run_queue+0x20a/0x2a8
[ 30.966331] [<ffffffff812a9be7>] scsi_next_command+0x36/0x46
[ 30.972265] [<ffffffff812aa193>] scsi_end_request+0x7e/0x8f
[ 30.978145] [<ffffffff812aa4aa>] scsi_io_completion+0x16b/0x396
[ 30.984382] [<ffffffff812a46fc>] scsi_finish_command+0xb0/0xb9
[ 30.990575] [<ffffffff812aa7d8>] scsi_softirq_done+0xf3/0xfc
[ 30.996578] [<ffffffff811c8f27>] blk_done_softirq+0x67/0x77
[ 31.002493] [<ffffffff81042fe7>] __do_softirq+0xaa/0x147
[ 31.008176] [<ffffffff8100cc0c>] call_softirq+0x1c/0x28
[ 31.013640] <EOI>
[ 31.015819] [<ffffffff8100e023>] do_softirq+0x33/0x6b
[ 31.021149] [<ffffffff810431e5>] ksoftirqd+0x82/0x149
[ 31.026484] [<ffffffff81043163>] ? ksoftirqd+0x0/0x149
[ 31.031883] [<ffffffff81050ebd>] kthread+0x7a/0x82
[ 31.036948] [<ffffffff8100cb0a>] child_rip+0xa/0x20
[ 31.042174] [<ffffffff81050e43>] ? kthread+0x0/0x82
[ 31.047294] [<ffffffff8100cb00>] ? child_rip+0x0/0x20
[ 31.052655] Code: 87 e0 00 00 00 48 8b 47 08 48 89 77 08 48 89 3e 48 89 46 08 48 89 30 c9 c3 31 c0 48 39 3f 55 48 8b 57 18 48 89 e5 75 13 48 8b 02 <48> 8b 50 40 b8 01 00 00 00 48 85 d2 74 02 ff d2 c9 c3 48 8b 47
[ 31.072919] RIP [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[ 31.079094] RSP <ffff880028203d28>
[ 31.082736] CR2: 0000000000000040
[ 31.086171] ---[ end trace d6541ba31725c49a ]---
[ 31.090995] Kernel panic - not syncing: Fatal exception in interrupt
[ 31.097613] Pid: 4, comm: ksoftirqd/0 Tainted: G D 2.6.32.61.TEK.V7.12.1.5024.p61 #1
[ 31.106415] Call Trace:
[ 31.108992] <IRQ> [<ffffffff81424681>] panic+0x84/0x139
[ 31.114663] [<ffffffff814273bd>] oops_end+0xa9/0xb9
[ 31.119842] [<ffffffff810280ac>] no_context+0x136/0x142
[ 31.125361] [<ffffffff8105ccb1>] ? tick_program_event+0x25/0x27
[ 31.131604] [<ffffffff8102822a>] __bad_area_nosemaphore+0x172/0x195
[ 31.138161] [<ffffffff8103671c>] ? try_to_wake_up+0x294/0x2af
[ 31.144223] [<ffffffff8102825b>] bad_area_nosemaphore+0xe/0x10
[ 31.150441] [<ffffffff81428919>] do_page_fault+0x14a/0x281
[ 31.156245] [<ffffffff814268ff>] page_fault+0x1f/0x30
[ 31.161633] [<ffffffff811c1eeb>] ? elv_queue_empty+0x12/0x24
[ 31.167644] [<ffffffff811c434b>] __blk_run_queue+0x22/0x74
[ 31.173421] [<ffffffff811c43be>] blk_run_queue+0x21/0x35
[ 31.179068] [<ffffffff812a9040>] scsi_run_queue+0x20a/0x2a8
[ 31.184994] [<ffffffff812a9be7>] scsi_next_command+0x36/0x46
[ 31.190927] [<ffffffff812aa193>] scsi_end_request+0x7e/0x8f
[ 31.196808] [<ffffffff812aa4aa>] scsi_io_completion+0x16b/0x396
[ 31.203044] [<ffffffff812a46fc>] scsi_finish_command+0xb0/0xb9
[ 31.209261] [<ffffffff812aa7d8>] scsi_softirq_done+0xf3/0xfc
[ 31.215203] [<ffffffff811c8f27>] blk_done_softirq+0x67/0x77
[ 31.221067] [<ffffffff81042fe7>] __do_softirq+0xaa/0x147
[ 31.226619] [<ffffffff8100cc0c>] call_softirq+0x1c/0x28
[ 31.232170] <EOI> [<ffffffff8100e023>] do_softirq+0x33/0x6b
[ 31.238174] [<ffffffff810431e5>] ksoftirqd+0x82/0x149
[ 31.243511] [<ffffffff81043163>] ? ksoftirqd+0x0/0x149
[ 31.248935] [<ffffffff81050ebd>] kthread+0x7a/0x82
[ 31.253986] [<ffffffff8100cb0a>] child_rip+0xa/0x20
[ 31.259156] [<ffffffff81050e43>] ? kthread+0x0/0x82
[ 31.264311] [<ffffffff8100cb00>] ? child_rip+0x0/0x20
KERNEL CRASH DUMP:
crash 5.0.6
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
SYSTEM MAP: System.map-2.6.32.61.TEK.V7.12.1.5024.p61
DEBUG KERNEL: vmlinux-2.6.32.61.TEK.V2013.07.08.12.26.13.jrich (2.6.32.61.TEK.V2013.07.08.12.26.13.jrich)
DUMPFILE: DUMP [PARTIAL DUMP]
CPUS: 8
DATE: Fri Jul 5 17:33:19 2013
UPTIME: 00:00:30
LOAD AVERAGE: 1.20, 0.27, 0.09
TASKS: 186
NODENAME: (none)
RELEASE: 2.6.32.61.TEK.V7.12.1.5024.p61
VERSION: #1 SMP PREEMPT Fri Jul 5 12:58:36 CDT 2013
MACHINE: x86_64 (2133 Mhz)
MEMORY: 32 GB
PANIC: "[ 30.788431] Oops: 0000 [#1] PREEMPT SMP " (check log for details)
PID: 25
COMMAND: "ksoftirqd/7"
TASK: ffff88083f94a820 [THREAD_INFO: ffff88083f966000]
CPU: 7
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 25 TASK: ffff88083f94a820 CPU: 7 COMMAND: "ksoftirqd/7"
#0 [ffff8800282e3a20] machine_kexec at ffffffff8102274d
#1 [ffff8800282e3a80] crash_kexec at ffffffff81067e82
#2 [ffff8800282e3b50] oops_end at ffffffff81427349
#3 [ffff8800282e3b80] no_context at ffffffff810280ac
#4 [ffff8800282e3bc0] __bad_area_nosemaphore at ffffffff8102822a
#5 [ffff8800282e3c10] bad_area_nosemaphore at ffffffff8102825b
#6 [ffff8800282e3c20] do_page_fault at ffffffff81428919
#7 [ffff8800282e3c70] page_fault at ffffffff814268ff
[exception RIP: elv_queue_empty+18]
RIP: ffffffff811c1eeb RSP: ffff8800282e3d28 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88083d7dec98 RCX: ffff88083d7dec98
RDX: ffff88083c53d440 RSI: 0000000000000292 RDI: ffff88083d7dec98
RBP: ffff8800282e3d28 R8: ffff8800282e3d68 R9: ffffffff81436400
R10: ffff88083c574850 R11: ffff8800282e3d68 R12: 0000000000000292
R13: ffff8800282e3d98 R14: ffff88083c574850 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff8800282e3d30] __blk_run_queue at ffffffff811c434b
#9 [ffff8800282e3d50] blk_run_queue at ffffffff811c43be
#10 [ffff8800282e3d70] scsi_run_queue at ffffffff812a9040
#11 [ffff8800282e3de0] scsi_next_command at ffffffff812a9be7
#12 [ffff8800282e3e10] scsi_end_request at ffffffff812aa193
#13 [ffff8800282e3e50] scsi_io_completion at ffffffff812aa4aa
#14 [ffff8800282e3ec0] scsi_finish_command at ffffffff812a46fc
#15 [ffff8800282e3ef0] scsi_softirq_done at ffffffff812aa7d8
#16 [ffff8800282e3f20] blk_done_softirq at ffffffff811c8f27
#17 [ffff8800282e3f50] __do_softirq at ffffffff81042fe7
#18 [ffff8800282e3fb0] call_softirq at ffffffff8100cc0c
--- <IRQ stack> ---
#19 [ffff88083f967e68] do_softirq at ffffffff8100e023
#20 [ffff88083f967e88] ksoftirqd at ffffffff810431e5
#21 [ffff88083f967ed8] kthread at ffffffff81050ebd
#22 [ffff88083f967f48] kernel_thread at ffffffff8100cb0a
Download attachment "kernelConfig" of type "application/octet-stream" (60247 bytes)
View attachment "lspci.lsmod.txt" of type "text/plain" (6959 bytes)
Powered by blists - more mailing lists