lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <636295BFF4A001418A00F46569A2CD2B161CE88B@US-PLNO-EXM01-P.global.tektronix.net>
Date:	Tue, 9 Jul 2013 17:42:29 +0000
From:	"Rich, Jason" <jason.rich@...comms.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Panic at _blk_run_queue on 2.6.32

Greetings,
I've recently encountered an issue where multiple hosts are failing to boot up about 1/5 of the time.  So far I have confirmed this
issue on three seperate host machines.  The issue presents itself after updating 2.6.32.39 to patch 50 and patch 61.
Both patch levels result in the failure described below.  Since this occurs on multiple hosts, I feel I can safely rule out hardware.

It is also of note that I have not seen this behavior on the 3.4.26 kernel, or on any of my 32bit hosts.  
That said, I have to support this software release (which runs on the 2.6 kernel) for at least another two years.  
I've looked through the list of open and closed issues on bugzilla and see nothing similar.
The console log of the crash is below, as well as the output of the crash dump (using crash tool). 
Lsmod, lspci & kernel config attached.
I'm at a loss and consider myself a novice at debugging kernel issues.  Any help is greatly appreciated.

Some details about the host:
1x Intel Xeon L5518 (quad core + HT)
32G DDR3
on board eUSB
This is an ATCA blade (irrelavent to the issue no doubt)
Lsmod, lspci & kernel config attached.


>From the console:
initramfs bootup: 2.6.32.61.TEK.V7.12.1.5024.p61 x86_64
<initramfs bootup...truncated as irrelevant to the issue at hand>

BOOT_IMAGE=/boot/bzImage-2.6.32.61.TEK.V7.12.1.5024.p61 -> /boot/bzImage-2.6.32.61.TEK.V7.12.1.5024.p61
<initramfs bootup...truncated as irrelevant to the issue at hand>

Setting kernel variables ... /etc/sysctl.conf...done.
Setting up X server socket directory /tmp/.X11-unix....
Setting up ICE socket directory /tmp/.ICE-unix....
Starting portmap daemon....
[   30.757040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[   30.765242] IP: [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[   30.771296] PGD 0 
[   30.773408] Oops: 0000 [#1] PREEMPT SMP 
[   30.777525] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb1/1-3/1-3:1.0/host9/target9:0:0/9:0:0:0/scsi_device/9:0:0:0/uevent
[   30.790203] CPU 0 
[   30.792253] Modules linked in: mptctl ipmi_poweroff igb ixgbe usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas edd i2c_dev ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: usb_storage]
[   30.812346] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.32.61.TEK.V7.12.1.5024.p61 #1 ATCA-4500
[   30.821280] RIP: 0010:[<ffffffff811c1eeb>]  [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[   30.829911] RSP: 0018:ffff880028203d28  EFLAGS: 00010046
[   30.835445] RAX: 0000000000000000 RBX: ffff88083cf7ec98 RCX: ffff88083cf7ec98
[   30.842885] RDX: ffff88083d3a88c0 RSI: 0000000000000292 RDI: ffff88083cf7ec98
[   30.850226] RBP: ffff880028203d28 R08: ffff880028203d68 R09: 0000000000ade46b
[   30.857708] R10: ffff88083ced5050 R11: ffff880028203d68 R12: 0000000000000292
[   30.865182] R13: ffff880028203d98 R14: ffff88083ced5050 R15: 0000000000000000
[   30.872594] FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
[   30.881111] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[   30.887087] CR2: 0000000000000040 CR3: 000000083db0a000 CR4: 00000000000006f0
[   30.894484] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   30.901905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   30.909376] Process ksoftirqd/0 (pid: 4, threadinfo ffff88083f8a6000, task ffff88083f884080)
[   30.918231] Stack:
[   30.920368]  ffff880028203d48 ffffffff811c434b ffff88083cf7ec98 ffff88083cf7ec98
[   30.927955] <0> ffff880028203d68 ffffffff811c43be ffff88083ced5000 ffff88083cf7ec98
[   30.936074] <0> ffff880028203dd8 ffffffff812a9040 ffff880028203d88 ffff880028203d98
[   30.944283] Call Trace:
[   30.946853]  <IRQ> 
[   30.949053]  [<ffffffff811c434b>] __blk_run_queue+0x22/0x74
[   30.954855]  [<ffffffff811c43be>] blk_run_queue+0x21/0x35
[   30.960479]  [<ffffffff812a9040>] scsi_run_queue+0x20a/0x2a8
[   30.966331]  [<ffffffff812a9be7>] scsi_next_command+0x36/0x46
[   30.972265]  [<ffffffff812aa193>] scsi_end_request+0x7e/0x8f
[   30.978145]  [<ffffffff812aa4aa>] scsi_io_completion+0x16b/0x396
[   30.984382]  [<ffffffff812a46fc>] scsi_finish_command+0xb0/0xb9
[   30.990575]  [<ffffffff812aa7d8>] scsi_softirq_done+0xf3/0xfc
[   30.996578]  [<ffffffff811c8f27>] blk_done_softirq+0x67/0x77
[   31.002493]  [<ffffffff81042fe7>] __do_softirq+0xaa/0x147
[   31.008176]  [<ffffffff8100cc0c>] call_softirq+0x1c/0x28
[   31.013640]  <EOI> 
[   31.015819]  [<ffffffff8100e023>] do_softirq+0x33/0x6b
[   31.021149]  [<ffffffff810431e5>] ksoftirqd+0x82/0x149
[   31.026484]  [<ffffffff81043163>] ? ksoftirqd+0x0/0x149
[   31.031883]  [<ffffffff81050ebd>] kthread+0x7a/0x82
[   31.036948]  [<ffffffff8100cb0a>] child_rip+0xa/0x20
[   31.042174]  [<ffffffff81050e43>] ? kthread+0x0/0x82
[   31.047294]  [<ffffffff8100cb00>] ? child_rip+0x0/0x20
[   31.052655] Code: 87 e0 00 00 00 48 8b 47 08 48 89 77 08 48 89 3e 48 89 46 08 48 89 30 c9 c3 31 c0 48 39 3f 55 48 8b 57 18 48 89 e5 75 13 48 8b 02 <48> 8b 50 40 b8 01 00 00 00 48 85 d2 74 02 ff d2 c9 c3 48 8b 47 
[   31.072919] RIP  [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[   31.079094]  RSP <ffff880028203d28>
[   31.082736] CR2: 0000000000000040
[   31.086171] ---[ end trace d6541ba31725c49a ]---
[   31.090995] Kernel panic - not syncing: Fatal exception in interrupt
[   31.097613] Pid: 4, comm: ksoftirqd/0 Tainted: G      D    2.6.32.61.TEK.V7.12.1.5024.p61 #1
[   31.106415] Call Trace:
[   31.108992]  <IRQ>  [<ffffffff81424681>] panic+0x84/0x139
[   31.114663]  [<ffffffff814273bd>] oops_end+0xa9/0xb9
[   31.119842]  [<ffffffff810280ac>] no_context+0x136/0x142
[   31.125361]  [<ffffffff8105ccb1>] ? tick_program_event+0x25/0x27
[   31.131604]  [<ffffffff8102822a>] __bad_area_nosemaphore+0x172/0x195
[   31.138161]  [<ffffffff8103671c>] ? try_to_wake_up+0x294/0x2af
[   31.144223]  [<ffffffff8102825b>] bad_area_nosemaphore+0xe/0x10
[   31.150441]  [<ffffffff81428919>] do_page_fault+0x14a/0x281
[   31.156245]  [<ffffffff814268ff>] page_fault+0x1f/0x30
[   31.161633]  [<ffffffff811c1eeb>] ? elv_queue_empty+0x12/0x24
[   31.167644]  [<ffffffff811c434b>] __blk_run_queue+0x22/0x74
[   31.173421]  [<ffffffff811c43be>] blk_run_queue+0x21/0x35
[   31.179068]  [<ffffffff812a9040>] scsi_run_queue+0x20a/0x2a8
[   31.184994]  [<ffffffff812a9be7>] scsi_next_command+0x36/0x46
[   31.190927]  [<ffffffff812aa193>] scsi_end_request+0x7e/0x8f
[   31.196808]  [<ffffffff812aa4aa>] scsi_io_completion+0x16b/0x396
[   31.203044]  [<ffffffff812a46fc>] scsi_finish_command+0xb0/0xb9
[   31.209261]  [<ffffffff812aa7d8>] scsi_softirq_done+0xf3/0xfc
[   31.215203]  [<ffffffff811c8f27>] blk_done_softirq+0x67/0x77
[   31.221067]  [<ffffffff81042fe7>] __do_softirq+0xaa/0x147
[   31.226619]  [<ffffffff8100cc0c>] call_softirq+0x1c/0x28
[   31.232170]  <EOI>  [<ffffffff8100e023>] do_softirq+0x33/0x6b
[   31.238174]  [<ffffffff810431e5>] ksoftirqd+0x82/0x149
[   31.243511]  [<ffffffff81043163>] ? ksoftirqd+0x0/0x149
[   31.248935]  [<ffffffff81050ebd>] kthread+0x7a/0x82
[   31.253986]  [<ffffffff8100cb0a>] child_rip+0xa/0x20
[   31.259156]  [<ffffffff81050e43>] ? kthread+0x0/0x82
[   31.264311]  [<ffffffff8100cb00>] ? child_rip+0x0/0x20



KERNEL CRASH DUMP:
crash 5.0.6
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

  SYSTEM MAP: System.map-2.6.32.61.TEK.V7.12.1.5024.p61                
DEBUG KERNEL: vmlinux-2.6.32.61.TEK.V2013.07.08.12.26.13.jrich (2.6.32.61.TEK.V2013.07.08.12.26.13.jrich)
    DUMPFILE: DUMP  [PARTIAL DUMP]
        CPUS: 8
        DATE: Fri Jul  5 17:33:19 2013
      UPTIME: 00:00:30
LOAD AVERAGE: 1.20, 0.27, 0.09
       TASKS: 186
    NODENAME: (none)
     RELEASE: 2.6.32.61.TEK.V7.12.1.5024.p61
     VERSION: #1 SMP PREEMPT Fri Jul 5 12:58:36 CDT 2013
     MACHINE: x86_64  (2133 Mhz)
      MEMORY: 32 GB
       PANIC: "[   30.788431] Oops: 0000 [#1] PREEMPT SMP " (check log for details)
         PID: 25
     COMMAND: "ksoftirqd/7"
        TASK: ffff88083f94a820  [THREAD_INFO: ffff88083f966000]
         CPU: 7
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 25     TASK: ffff88083f94a820  CPU: 7   COMMAND: "ksoftirqd/7"
 #0 [ffff8800282e3a20] machine_kexec at ffffffff8102274d
 #1 [ffff8800282e3a80] crash_kexec at ffffffff81067e82
 #2 [ffff8800282e3b50] oops_end at ffffffff81427349
 #3 [ffff8800282e3b80] no_context at ffffffff810280ac
 #4 [ffff8800282e3bc0] __bad_area_nosemaphore at ffffffff8102822a
 #5 [ffff8800282e3c10] bad_area_nosemaphore at ffffffff8102825b
 #6 [ffff8800282e3c20] do_page_fault at ffffffff81428919
 #7 [ffff8800282e3c70] page_fault at ffffffff814268ff
    [exception RIP: elv_queue_empty+18]
    RIP: ffffffff811c1eeb  RSP: ffff8800282e3d28  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffff88083d7dec98  RCX: ffff88083d7dec98
    RDX: ffff88083c53d440  RSI: 0000000000000292  RDI: ffff88083d7dec98
    RBP: ffff8800282e3d28   R8: ffff8800282e3d68   R9: ffffffff81436400
    R10: ffff88083c574850  R11: ffff8800282e3d68  R12: 0000000000000292
    R13: ffff8800282e3d98  R14: ffff88083c574850  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8800282e3d30] __blk_run_queue at ffffffff811c434b
 #9 [ffff8800282e3d50] blk_run_queue at ffffffff811c43be
#10 [ffff8800282e3d70] scsi_run_queue at ffffffff812a9040
#11 [ffff8800282e3de0] scsi_next_command at ffffffff812a9be7
#12 [ffff8800282e3e10] scsi_end_request at ffffffff812aa193
#13 [ffff8800282e3e50] scsi_io_completion at ffffffff812aa4aa
#14 [ffff8800282e3ec0] scsi_finish_command at ffffffff812a46fc
#15 [ffff8800282e3ef0] scsi_softirq_done at ffffffff812aa7d8
#16 [ffff8800282e3f20] blk_done_softirq at ffffffff811c8f27
#17 [ffff8800282e3f50] __do_softirq at ffffffff81042fe7
#18 [ffff8800282e3fb0] call_softirq at ffffffff8100cc0c
--- <IRQ stack> ---
#19 [ffff88083f967e68] do_softirq at ffffffff8100e023
#20 [ffff88083f967e88] ksoftirqd at ffffffff810431e5
#21 [ffff88083f967ed8] kthread at ffffffff81050ebd
#22 [ffff88083f967f48] kernel_thread at ffffffff8100cb0a

Download attachment "kernelConfig" of type "application/octet-stream" (60247 bytes)

View attachment "lspci.lsmod.txt" of type "text/plain" (6959 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ