lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Jun 2008 23:05:11 +0200
From:	william <william@...sse.org>
To:	"Alan Cox" <alan@...rguk.ukuu.org.uk>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: strange freeze with VIA C7 dedicated server and libc 2.6.1

> Except for bugs in glibc that trigger things happening as root which go
> on to do stuff like power down the system (root is allowed to power
> down/reboot/etc). That is a fairly unlikely case.

 yes, I know this is something really unbelievable, with nothing in
the logs . . . but it happens to at least 20 people, all the upgraded
boxes have the problem, and all the downgraded boxes see the problem
disappear.

>> that is triggering the bug. Regardless of what that is and whether it should be
>> doing it, it shouldn't completely hang the kernel."
> The first thing is to find out which glibc version is the latest that
> works, which is the earliest that fails.
 Yes, but I couldnt test it by myself on a production dedicated server.

 The nly thing whoich are 100% sure :
gentoo : upgrade from glibc-2.5-r4  to glibc-2.6.1 makes the problem appear.
debian : upgrade from  2.3.6.ds1-3 to 2.3.6.ds1-13etch5  makes the
problem appear.
all the debian users who downgraded their libc to 2.3.6.ds1-3 see the
problem disappear.
( I suppose the -13 in debian package name means 2.6.3+many patches,
probably the  2.3.6.ds1-13etch5 is a 2.6.x ? )

( I coulldn't downgrade libc on gentoo, downgrading libc on gentoo is
a nearly suicidal idea )

 But, now I have good news, dedibox.fr admins accepted to lend us a
box for testing purpose.

 I can offer a testing shell with unlimited sudo to any kernel
developper, interested in investigating this mystery, and  having a
gnupg key and a web of trust ( mine is
http://pgpkeys.mit.edu:11371/pks/lookup?op=vindex&search=0x690B4E07 we
probably have a trust path ).

> Second is to try and find out
> what apps or event is the trigger for the fail (eg can you boot into text
> mode with init s and then run 2 or 3 cpu hogs all day)

 I have have only some details on this point :

* my box freeze during morning sql updates ( updating 300 MB SQL
during 3 hours every morning ), but the scrpt is launched with nice
-20
* crontab could be related to the problem, it seems to me that I have
less freezes since I splitted one big crontab ( launching a 3 hour
long script ) in 4 smaller crontabs, some other users said that
disabling big crontabs helped
* the load is not so big , often between 1 and 2

 another thing it did not say in the first mail, after the problem
appeared I installed lm_sensors and watchdog to try investigating the
problem :

* the temperature is never higher than 54°C which seems ok for a VIA
C7, am I wrong ? some people say 54°c is ok, some other says its not
normal with a via C7 in a datacenter . . .

* the watchdog says nothing in the logs, but is able to reboot the box.

 Thank you very much for your answer Alan, I were hesitating on
posting a report with no logs, no clues . . . your answer gives me a
little hope ;)


-- 
Cordialement

 William Waisse
  http://waisse.org | http://neoskills.com
   http://cahierspip.ww7.be | http://feeder.ww7.be

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ