[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ae3aa420901021125n1153053fsdf2378e7d11abbc0@mail.gmail.com>
Date: Fri, 2 Jan 2009 13:25:38 -0600
From: "Linas Vepstas" <linasvepstas@...il.com>
To: linux-kernel@...r.kernel.org
Cc: "Jeffrey J. Kosowsky" <jeff@...owsky.org>,
MentalMooMan <slashdot@...eshallam.info>,
"Travis Crump" <pretzalz@...hhouse.org>,
Goodgerster <goodgerster@...il.com>, burdell@...ntheinter.net
Subject: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009
Slashdot reported a story of Linux machines crashing on New years eve.
http://ask.slashdot.org/article.pl?sid=09/01/01/1930202
Below follows a summary of the reported crashes. I'm ignoring the
zillions of "mine didn't crash" reports, or the "you're a paranoid
conspiracy theorist, its random chance" reports.
So far, 31 users reported 53 hard crashes at/near midnight, new years.
Symptoms are:
-- hard hang (systems not pingable)
-- irq's not serviced (if disk was active at time of crash,
the disk activity light stays lit)
-- cold reboot (poweroff) required
-- systems work normally after reboot
-- no messages in syslog, no kernel oops, no core file crash dumps
-- not reproducible (simply setting the clock back is not enough
to reproduce; guessing that a simulation of stratum 0 ntp server
is needed to force the leap-second.)
-- The affected machines seem to be running either 2.6.21, 2.6.26 or 2.6.27
Suspect its an kernel race condition triggered by ntp bumping the second.
-- its the leap second, since this doesn't happen other years,
-- its a race condition, since some identically configured machines
didn't go down, while others did.
-- its a race condition, since majority of systems were not affected.
-- its a race condition, since affected systems seem to have been
mostly non-idle servers, or some non-idle desktops/ tv set-tops.
-- ntpd is the only service that monkeys with time adjustments.
There is a "well-known" deadlock in 2.6.21 kernels that caused this:
http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg15039.html
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=746976a301ac9c9aa10d7d42454f8d6cdad8ff2b;hp=872aad45d6174570dd2e1defc3efee50f2cfcc72
Here's the synopsis of individual reports. Most folks left very little
info, nor did they leave a way to contact them.
aputerguy
Fedora 8 Linux server 2.6.26.6-49.fc8 Intel p4 2.8GHz.
ASUS P4PE 2 1-TB Seagate SATA hardrives, 1 200MB PATA drive. 2GB DDR. 1
pchdtv5500 card, 1 winfast 2000XP tv card, 1 nVidia 6200 graphics card.
MentalMooMan (785571) <slashdotNO@...Mjameshallam.info>
mythtv box (running mythbuntu) used to be something like 7.10 upgraded to 8.10
2.6.27-9.19-generic ntpd version 4.2.4p4. The CPU is an AMD Athlon XP
1700+ or 1800+. The motherboard is an EPOX EP-8KTA3Pro. message in
/var/log/messages at boot-time:
"warning: `ntpd' uses 32-bit capabilities (legacy support in use)"
Anonymous Coward MythTV box on Fedora 8 (Athlon XP1700+)
athakur999 (44340) Mythbuntu-based HTPC
AZPolarBear (661815) Fedora 8 system
Anonymous Coward 5 of about 70 of our production servers
Anonymous Coward I did have two 2.6.21 servers crash last night
Anonymous Coward Ubuntu 8.10 MythTV box.
SanjuroE (131728) Debian testing and at that time Debian kernel
2.6.26-11.
lukas84 (912874) internal testing machine that's still on 2.6.21
Anonymous Coward Debian testing Kernel 2.6.26
zerosumgame (1429741) kernel 2.6.21 on older Dell 1850's
Wibla (677432) Both my fileservers running debian etch installed from
custom install media (pre-etchnhalf) running 2.6.21-2 and 2.6.21-6
crashed,
Maow (620678) Ubuntu 8.04 on AMD64
Burdell (228580) <burdell@...ntheinter.net> RHEL 4 server
RHEL 4 update 6
kernel-smp-2.6.9-67.0.7.EL.x86_64
ntp-4.2.0.a.20040617-6.el4.x86_64
Penguin Computing Altus 2600
dual dual-core Opteron 2212 HE
4G RAM
nVidia MCP55 chipset
I have 9 servers (mostly
different hardware, but one the same as above), all running the exact
same kernel and package set. Only one crashed; the others logged the
leap second and went on fine.
Pretzalzz (577309) Travis Crump <pretzalz@...hhouse.org>
http://lists.debian.org/debian-user/2009/01/msg00006.html
debian lenny ntp=1:4.2.4p4+dfsg-7;
Linux version 2.6.26-1-amd64 (Debian 2.6.26-4[since updated])
Processor : 2x AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
kst (168867) Ubuntu 8.10
arodland (127775) Debian 2.6.21
Goodgerster (904325) <goodgerster AT gmail DOT com>
Debian
Morgor (542294) Fedora 8 Two of our production servers running fedora 8
Lightjumper (532700) Fedora 9 server
blit (90883) 10 machines running Fedora Core 7
Qwell (684661) Ubuntu 8.10
Jim Fenton (514449) Fedora 8 machine (kernel: 2.6.26.6-49.fc8)
Anonymous Coward My laptop
dmrobbin (560931) F8 box went down,
Anonymous Coward F8 server also "hung"
Anonymous Coward 2.6.26-1-amd64. Four of the 20 locked up
Anonymous Coward (actually, me, Linas) amd64 dual core, ubuntu 8.04
custom-compiled kernel, 2.6.26-64-bit
Anonymous Coward kernel 2.6.26.3-29.fc9.
Anonymous Coward 3 Ubuntu 8.10 Virtual Box (3 of them)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists