[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ffbcf00806241405w5b757b56p84cd560166ea8f90@mail.gmail.com>
Date: Tue, 24 Jun 2008 23:05:11 +0200
From: william <william@...sse.org>
To: "Alan Cox" <alan@...rguk.ukuu.org.uk>
Cc: linux-kernel@...r.kernel.org
Subject: Re: strange freeze with VIA C7 dedicated server and libc 2.6.1
> Except for bugs in glibc that trigger things happening as root which go
> on to do stuff like power down the system (root is allowed to power
> down/reboot/etc). That is a fairly unlikely case.
yes, I know this is something really unbelievable, with nothing in
the logs . . . but it happens to at least 20 people, all the upgraded
boxes have the problem, and all the downgraded boxes see the problem
disappear.
>> that is triggering the bug. Regardless of what that is and whether it should be
>> doing it, it shouldn't completely hang the kernel."
> The first thing is to find out which glibc version is the latest that
> works, which is the earliest that fails.
Yes, but I couldnt test it by myself on a production dedicated server.
The nly thing whoich are 100% sure :
gentoo : upgrade from glibc-2.5-r4 to glibc-2.6.1 makes the problem appear.
debian : upgrade from 2.3.6.ds1-3 to 2.3.6.ds1-13etch5 makes the
problem appear.
all the debian users who downgraded their libc to 2.3.6.ds1-3 see the
problem disappear.
( I suppose the -13 in debian package name means 2.6.3+many patches,
probably the 2.3.6.ds1-13etch5 is a 2.6.x ? )
( I coulldn't downgrade libc on gentoo, downgrading libc on gentoo is
a nearly suicidal idea )
But, now I have good news, dedibox.fr admins accepted to lend us a
box for testing purpose.
I can offer a testing shell with unlimited sudo to any kernel
developper, interested in investigating this mystery, and having a
gnupg key and a web of trust ( mine is
http://pgpkeys.mit.edu:11371/pks/lookup?op=vindex&search=0x690B4E07 we
probably have a trust path ).
> Second is to try and find out
> what apps or event is the trigger for the fail (eg can you boot into text
> mode with init s and then run 2 or 3 cpu hogs all day)
I have have only some details on this point :
* my box freeze during morning sql updates ( updating 300 MB SQL
during 3 hours every morning ), but the scrpt is launched with nice
-20
* crontab could be related to the problem, it seems to me that I have
less freezes since I splitted one big crontab ( launching a 3 hour
long script ) in 4 smaller crontabs, some other users said that
disabling big crontabs helped
* the load is not so big , often between 1 and 2
another thing it did not say in the first mail, after the problem
appeared I installed lm_sensors and watchdog to try investigating the
problem :
* the temperature is never higher than 54°C which seems ok for a VIA
C7, am I wrong ? some people say 54°c is ok, some other says its not
normal with a via C7 in a datacenter . . .
* the watchdog says nothing in the logs, but is able to reboot the box.
Thank you very much for your answer Alan, I were hesitating on
posting a report with no logs, no clues . . . your answer gives me a
little hope ;)
--
Cordialement
William Waisse
http://waisse.org | http://neoskills.com
http://cahierspip.ww7.be | http://feeder.ww7.be
Powered by blists - more mailing lists