[<prev] [next>] [day] [month] [year] [list]
Message-ID: <D0D0330CBD07114D85B70B784E80C2F20DEF31@exch-mail2.win.slac.stanford.edu>
From: gtb at slac.stanford.edu (Buhrmaster, Gary)
Subject: Windoze almost managed to 200x repeat 9/11
> -----Original Message-----
> From: full-disclosure-admin@...ts.netsys.com
> [mailto:full-disclosure-admin@...ts.netsys.com] On Behalf Of Troy
> Sent: Saturday, September 25, 2004 12:41 PM
> To: full-disclosure@...ts.netsys.com
> Subject: Re: [Full-Disclosure] Windoze almost managed to 200x
> repeat 9/11
>
....
> I think the worst thing about this is that the FAA and the developers
> of the app knew about the problem for quite some time, knew what the
> problem was, and, rather than fix the code, they just rebooted the
> system to work around it and ignored the main problem.
>
> --
> Troy
This type of workaround is typical in mission critical code
where the cost of recertification exceeds the cost of the
workarounds). It is not enough to show that the (corrected)
program logic solves a known problem. The (unknown) side
effects of adding code (and moving code/data in memory,
which may expose another bug/feature) means that the entire
system must be recertified to allow a change to move to
production. Recertification for large systems can easily
run into the millions (when you look at fully encumbered
costs).
The real issue was not that there was a known problem
(all large systems have bugs/features), nor that a choice
was made to apply a workaround rather than correct the
root cause. It was that the workaround for the problem
did not deal with the failure mode of the person
(apparently) failing to do his/her job of restarting
the system. There should have been some checks to insure
that the workaround was performed. I'll bet that the
FAA is now instituting such checks.
But, of course, hindsight is 20/20.
Gary
Powered by blists - more mailing lists