[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20130513034835.GA1130@marklar.spinoli.org>
Date: Sun, 12 May 2013 23:48:35 -0400
From: Hank Leininger <hlein@...c.info>
To: linux-kernel@...r.kernel.org
Subject: BUG: spinlock lockup, async_umap_flush_lock in 3.4, 3.7, 3.8
I've got several systems with similar hardware which crash with BUG:
spinlock errors on async_umap_flush_lock such as:
BUG: spinlock lockup suspected on CPU#0, sh/1166
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
(More examples below.)
In general these happen very rarely--but a specific userland workload
(lots of mongodb + sqlite reads & writes, while other CPUs are running
compute-heavy tasks) seems to trigger it within a few minutes to hours.
After 1-3 "spinlock lockup suspected" errors, the system locks up, no
response to alt+sysrq.
I've gotten the crash on one system in the last couple of days with
3.7.1-gentoo, 3.8.11-gentoo, 3.8.11 vanilla, and 3.4.4 vanilla. When
I looked further back, over the past year another system crashed with
similar errors (under similar workload) running 3.7.0-gentoo and
3.8.4-gentoo. Further back than that there are 2-3 crashes on those
and other similar systems using 2.6.x and 3.0.x, but their errors are
different enough that they may not be related.
These systems each have:
Supermicro X8DTU-F motherboard
2x Xeon E5645 (6 cores each + hyperthreading)
24 GB ECC RAM
Adaptec 51645 RAID controller w/bbu
12x 2TB SAS disks
They are using hw raid, 11 disks in a RAID6 with 1 hot-spare; main
partition is 16 TB.
They all use loop-aes v3.6g as a replacement loop.ko module to encrypt
their / filesystem (using the aes-ni instruction set).
3.8.11 .config pastebin: http://pastebin.com/u3BDPTvP
3.4.44 .config pastebin: http://pastebin.com/1Rpk9RVf
Generally speaking, 3.8.x and 3.4.44 kernels were compiled with GCC 4.7;
the older 3.7.x kernels were compiled with GCC 4.6.
Error messages, captured by serial consoles, newest crashes first:
Host1:
3.4.44
BUG: spinlock lockup on CPU#0, john/21637
lock: ffffffff816558d0, .magic: dead4ead, .owner: mongod/27646, .owner_cpu: 8
BUG: spinlock lockup on CPU#6, mongod/3256
lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18
BUG: spinlock lockup on CPU#20, khugepaged/735
lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18
3.8.11
BUG: spinlock lockup suspected on CPU#0, sh/1166
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
3.8.11-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/3678, .owner_cpu: 4
BUG: spinlock lockup suspected on CPU#16, mongod/3115
lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5
BUG: spinlock lockup suspected on CPU#6, khugepaged/744
lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5
3.7.1-gentoo
BUG: spinlock lockup suspected on CPU#0, john/32030
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#19, mongod/18985
lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2
BUG: spinlock lockup suspected on CPU#3, scsi_eh_0/1407
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#9, khugepaged/741
lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2
Host2:
3.8.4-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/22377, .owner_cpu: 9
BUG: spinlock lockup suspected on CPU#4, mongod/3377
lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14
BUG: spinlock lockup suspected on CPU#21, mongod/3375
lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14
3.7.0-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongo/16561, .owner_cpu: 3
(The repeated crashes on Host2 lead to irreperable ext4 corruption.)
I can provide System.map files if they are interesting. I'd be happy
to try a specific kernel, add patches to harvest more information in
the event of a crash, etc.
Thanks,
--
Hank Leininger <hlein@...c.info>
3C2A 4EEE ED36 D136 18F2 1B30 47A8 D14B E13E 9C6A
Download attachment "signature.asc" of type "application/pgp-signature" (448 bytes)
Powered by blists - more mailing lists