lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ae3aa420901021125n1153053fsdf2378e7d11abbc0@mail.gmail.com>
Date:	Fri, 2 Jan 2009 13:25:38 -0600
From:	"Linas Vepstas" <linasvepstas@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	"Jeffrey J. Kosowsky" <jeff@...owsky.org>,
	MentalMooMan <slashdot@...eshallam.info>,
	"Travis Crump" <pretzalz@...hhouse.org>,
	Goodgerster <goodgerster@...il.com>, burdell@...ntheinter.net
Subject: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009

Slashdot reported a story of Linux machines crashing on New years eve.

http://ask.slashdot.org/article.pl?sid=09/01/01/1930202

Below follows a summary of the reported crashes. I'm ignoring the
zillions of "mine didn't crash" reports, or the "you're a paranoid
conspiracy theorist, its random chance" reports.

So far, 31 users reported 53 hard crashes at/near midnight, new years.
Symptoms are:
-- hard hang (systems not pingable)
-- irq's not serviced (if disk was active at time of crash,
   the disk activity light stays lit)
-- cold reboot (poweroff) required
-- systems work normally after reboot
-- no messages in syslog, no kernel oops, no core file crash dumps
-- not reproducible (simply setting the clock back is not enough
   to reproduce; guessing that a simulation of stratum 0 ntp server
   is needed to force the leap-second.)
-- The affected machines seem to be running either 2.6.21, 2.6.26 or 2.6.27

Suspect its an kernel race condition triggered by ntp bumping the second.
-- its the leap second, since this doesn't happen other years,
-- its a race condition, since some identically configured machines
   didn't go down, while others did.
-- its a race condition, since majority of systems were not affected.
-- its a race condition, since affected systems seem to have been
   mostly non-idle servers, or some non-idle desktops/ tv set-tops.
-- ntpd is the only service that monkeys with time adjustments.

There is a "well-known" deadlock in 2.6.21 kernels that caused this:
http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg15039.html

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=746976a301ac9c9aa10d7d42454f8d6cdad8ff2b;hp=872aad45d6174570dd2e1defc3efee50f2cfcc72

Here's the synopsis of individual reports.  Most folks left very little
info, nor did they leave a way to contact them.


aputerguy
Fedora 8 Linux server 2.6.26.6-49.fc8 Intel p4 2.8GHz.
ASUS P4PE 2 1-TB Seagate SATA hardrives, 1 200MB PATA drive. 2GB DDR. 1
pchdtv5500 card, 1 winfast 2000XP tv card, 1 nVidia 6200 graphics card.


MentalMooMan (785571)  <slashdotNO@...Mjameshallam.info>
mythtv box (running mythbuntu) used to be something like 7.10 upgraded to 8.10
2.6.27-9.19-generic ntpd version 4.2.4p4. The CPU is an AMD Athlon XP
1700+ or 1800+.  The motherboard is an EPOX EP-8KTA3Pro. message in
/var/log/messages at boot-time:
"warning: `ntpd' uses 32-bit capabilities (legacy support in use)"


Anonymous Coward  MythTV box on Fedora 8 (Athlon XP1700+)


athakur999 (44340)  Mythbuntu-based HTPC


AZPolarBear (661815) Fedora 8 system


Anonymous Coward 5 of about 70 of our production servers


Anonymous Coward   I did have two 2.6.21 servers crash last night


Anonymous Coward Ubuntu 8.10 MythTV box.


SanjuroE (131728) Debian testing and at that time Debian kernel
2.6.26-11.


lukas84 (912874) internal testing machine that's still on 2.6.21


Anonymous Coward Debian testing Kernel 2.6.26


zerosumgame (1429741) kernel 2.6.21 on older Dell 1850's


Wibla (677432)  Both my fileservers running debian etch installed from
custom install media (pre-etchnhalf) running 2.6.21-2 and 2.6.21-6
crashed,


Maow (620678)  Ubuntu 8.04 on AMD64


Burdell (228580)  <burdell@...ntheinter.net> RHEL 4 server
RHEL 4 update 6
kernel-smp-2.6.9-67.0.7.EL.x86_64
ntp-4.2.0.a.20040617-6.el4.x86_64
Penguin Computing Altus 2600
dual dual-core Opteron 2212 HE
4G RAM
nVidia MCP55 chipset
I have 9 servers (mostly
different hardware, but one the same as above), all running the exact
same kernel and package set.  Only one crashed; the others logged the
leap second and went on fine.


Pretzalzz (577309) Travis Crump <pretzalz@...hhouse.org>
http://lists.debian.org/debian-user/2009/01/msg00006.html
debian lenny ntp=1:4.2.4p4+dfsg-7;
Linux version 2.6.26-1-amd64 (Debian 2.6.26-4[since updated])
Processor  : 2x AMD Athlon(tm) 64 X2 Dual Core Processor 3800+


kst (168867) Ubuntu 8.10


arodland (127775) Debian 2.6.21


Goodgerster (904325)  <goodgerster AT gmail DOT com>
Debian


Morgor (542294)  Fedora 8 Two of our production servers running fedora 8


Lightjumper (532700) Fedora 9 server


blit (90883) 10 machines running Fedora Core 7


Qwell (684661) Ubuntu 8.10


Jim Fenton (514449) Fedora 8 machine (kernel: 2.6.26.6-49.fc8)


Anonymous Coward  My laptop


dmrobbin (560931)   F8 box went down,


Anonymous Coward  F8 server also "hung"


Anonymous Coward 2.6.26-1-amd64. Four of the 20 locked up


Anonymous Coward (actually, me, Linas) amd64 dual core, ubuntu 8.04
custom-compiled kernel, 2.6.26-64-bit


Anonymous Coward kernel 2.6.26.3-29.fc9.


Anonymous Coward  3 Ubuntu 8.10 Virtual Box (3 of them)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ