[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1199720934.11173.49.camel@localhost.localdomain>
Date: Mon, 07 Jan 2008 16:48:54 +0100
From: Brice Figureau <brice+lklm@...sofwonder.com>
To: linux-kernel@...r.kernel.org
Subject: Strange freeze on 2.6.22 (deadlock?)
Hi,
I'm seeing a strange complete server freeze/lock-up on an bi-Xeon HT
amd64 server running standard debian 2.6.22 (and before that vanilla
2.6.19.x and 2.6.20.x which exhibited the same issue).
I'm only reporting it now, since I could get a full sysrq-t only this
morning.
The symptoms are that every 5 to 7 days, the server (which acts as a MX
along with a few low traffic websites) locks-up. The ipmi watchdog is
unable to reboot the server (and doesn't even trigger, since there is no
evidence in the esmlog), the machine is still pingable. I can't ssh to
it, but I can enter my login & password on a serial console, but no
shell is started.
Pressing sysrq-t produced the trace hosted here:
http://www.daysofwonder.com/bug/crash-server1.txt.gz
It happened one time when I was connected to the server through ssh and
I could see that the load started to increase well above 100. It was
then impossible to launch new process from the command-line (and I had
to reboot manually).
It happened also last week, and the server was stuck for about 6 hours.
When I started investigating what was wrong, it slowly came back to life
(with an avg 1-min load of more than 1500, and tons of cron processes
running in parallel).
I'm not really familiar with kernel development so I can't really find
the issue in the aforementioned trace output.
What I think is that for some reason there is a race/deadlock that
finally prevents new processes to really start (which in turns produces
the high load).
What seems suspect in the aforementioned trace is:
*) lot of processes stacktrace ends in __mod_timer+0xc3/0xd3
which seems to be this line from kernel/timer.c
415 timer->expires = expires;
416 internal_add_timer(base, timer);
--> spin_unlock_irqrestore(&base->lock, flags);
419 return ret;
420 }
*) lot of processes stacktrace ends in __mutex_lock_slowpath and/or zone_statistics
Anyway, I will soon reboot to a 2.6.23.x to see if that symptom
persists.
More information (config, server specs) are available on request.
I'm not subscribed to the list, so please CC: me for any anwser.
Many thanks,
--
Brice Figureau <brice+lklm@...sofwonder.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists