lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Mar 2009 19:36:08 +0200
From:	Fabio Coatti <cova@...rara.linux.it>
To:	linux-kernel@...r.kernel.org
Subject: [BUG] spinlock lockup on CPU#0

Hi all, I've got the following BUG: report on one of our servers running 
2.6.28.8; some background:
we are seeing several lockups in db (mysql) servers that shows up as a sudden 
load increase and then, very quickly, the server freezes. It happens in a 
random way, sometimes after weeks, sometimes very quickly after a system 
reboot. Trying to discover the problem we installed latest (at the time of 
test) 2.6.28.X kernel and loaded it with some high disk I/O operations (find, 
dd, rsync and so on).
We have been able  to crash a server with these tests; unfortunately we have 
been able to capture only a remote screen snapshot so I copied by hand 
(hopefully without typos) the data and this is the result is the following:

 [<ffffffff80213590>] ? default_idle+0x30/0x50
 [<ffffffff8021358e>] ? default_idle+0x2e/0x50
 [<ffffffff80213793>] ? c1e_idle+0x73/0x120
 [<ffffffff80259f11>] ? atomic_notifier_call_chain+0x11/0x20
 [<ffffffff8020a31f>] ? cpu_idle+0x3f/0x70
BUG: spinlock lockup on CPU#0, find/13114, ffff8801363d2c80
Pid: 13114, comm: find Tainted: G      D W  2.6.28.8 #5
Call Trace:
 [<ffffffff8041a02e>] _raw_spin_lock+0x14e/0x180
 [<ffffffff8060b691>] _spin_lock+0x51/0x70
 [<ffffffff80231ca4>] ? task_rq_lock+0x54/0xa0
 [<ffffffff80231ca4>] task_rq_lock+0x54/0xa0
 [<ffffffff80234501>] try_to_wake_up+0x91/0x280
 [<ffffffff80234720>] wake_up_process+0x10/0x20
 [<ffffffff803bf863>] xfsbufd_wakeup+0x53/0x70
 [<ffffffff802871e0>] shrink_slab+0x90/0x180
 [<ffffffff80287526>] try_to_free_pages+0x256/0x3a0
 [<ffffffff80285280>] ? isolate_pages_global+0x0/0x280
 [<ffffffff80281166>] __alloc_pages_internal+0x1b6/0x460
 [<ffffffff802a186d>] alloc_page_vma+0x6d/0x110
 [<ffffffff8028d3ab>] handle_mm_fault+0x4ab/0x790
 [<ffffffff80225293>] do_page_fault+0x463/0x870
 [<ffffffff8060b199>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8060bf52>] error_exit+0x0/0xa9

The machine is a dual 2216HE (2 cores) AMD with 4 Gb ram; below you can find 
the .config file. (from /proc/config.gz)

we are seeing similar lockups (at least similar for the results) since several 
kernel revisions (starting from 2.6.25.X) and on different hardware. Several 
machines are hit by this, mostly databases (maybe for the specific usage, other 
machines being apache servers, I don't know).

Could someone give us some hints about this issue, or at least some 
suggestions on how to dig it? Of course we can do any sort of testing and 
tries.

Thanks for any answer.




Download attachment "config_bug.gz" of type "application/x-gzip" (8565 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ