[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <200409241550.i8OFoWd27525@pop-5.dnv.wideopenwest.com>
From: mvp at joeware.net (joe)
Subject: Windoze almost managed to 200x repeat 9/11
>From the article
"The servers are timed to shut down after 49.7 days of use in order to
prevent a data overload, a union official told the LA Times. To avoid this
automatic shutdown, technicians are required to restart the system manually
every 30 days. An improperly trained employee failed to reset the system,
leading it to shut down without warning, the official said."
And
"Soon after installation, however, the FAA discovered that the system design
could lead to a radio system shutdown, and put the maintenance procedure
into place as a workaround, the LA Times said. The FAA reportedly said it
has been working on a permanent fix but has only eliminated the problem in
Seattle. The FAA is now planning to institute a second workaround - an alert
that will warn controllers well before the software shuts down."
It would appear that the VSCS shut down, not the system. Further it would
appear that someone failed to reboot the system and caused this, not that
the system hung or died mid-restart.
This article combined with other discussions makes it sound like the app
itself had issue, the system didn't crash or drop. Kernel memory wasn't
corrupted etc.
The fact that they want it rebooted and the time frame mentioned, 49.7 days
which happens to coincide perfectly with when the 32 bit DWORD output from
GetTickCount has to roll over to 0, means they are probably basing some
timing info off the output of GetTickCount and can't properly handle the
rollover. GetTickCount is based off system start date.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/bas
e/gettickcount.asp
Options are to have a thread managing your own timer values based on some
floating point type or 64 bit integer or 64 bit high resolution timers (all
of which just moves the problem further out and are all available right now
and have been for some time) or properly handle the datatype used.
A popular option which is even worse is to base things off the system clock.
While you don't have to worry about a rollover for a long long time with
Windows FILETIME (64 bit) and epoch if using ctime, at that point then you
start getting all sorts of timing issues due to time correction software or
the user changing the time.
Anyway, had they used high resolution timers
(QueryPerformanceCounter/QueryPerformanceFrequency) instead of GetTickCount
they would have been working with an API available since like NT3.1/Win9x
and would have been using 64 bit INTs and if I recall correctly wouldn't
have had an issue until the system had been up for something like 100 years
(200 if using unsigned) which obviously could NEVER happen with a Windows
system. Been a while since I worked out the details of those functions.
Anyway, many coders avoid them because they don't like working with 64 bit
INTs.
joe
-----Original Message-----
From: Barry Fitzgerald [mailto:bkfsec@....lonestar.org]
Sent: Friday, September 24, 2004 10:15 AM
To: joe
Cc: full-disclosure@...ts.netsys.com
Subject: Re: [Full-Disclosure] Windoze almost managed to 200x repeat 9/11
joe wrote:
>
>
Where issues like this relate to the OS is in the fact that the OS itself
shouldn't be brought down by a poorly designed app.
Of course, you can shoot yourself in the foot in any OS, but an overflow in
a local app should never take down the kernel. Unfortunately, memory
management in MS Windows (though it's gotten better over time) is still not
up to par and that is what causes a number of these issues. Not to mention
poor system architecture and design on the part of MS.
Was it MS Windows that actually held the code that brought the system down?
Well, that depends on how far down you want to drill and where you place the
burden of OS stability. If you place it on the OS, then Windows is fair
game. If you place the burden of OS stability on the app, then you're
foolish and don't understand OS design concepts. :) (said in jest, but
then, so is most truth)
The article doesn't make the situation entirely clear. Did the app
intentionally restart the system and foul it? Did the restart occur because
the app crashed? I'm skeptical because technical details like this are
usually confused, mislabeled, or misreported... even
(especially?) in tech rags. So, who holds the burden in this case depends
on the answers to the questions above.
-Barry
Powered by blists - more mailing lists