full-disclosure - Windoze almost managed to 200x repeat 9/11

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <41543A02.8010709@sdf.lonestar.org>
From: bkfsec at sdf.lonestar.org (Barry Fitzgerald)
Subject: Windoze almost managed to 200x repeat 9/11

Frank Knobbe wrote:

>On Fri, 2004-09-24 at 09:15, Barry Fitzgerald wrote:
>  
>
>>The article doesn't make the situation entirely clear.  Did the app 
>>intentionally restart the system and foul it?  Did the restart occur 
>>because the app crashed?  
>>    
>>
>
>No, no, the problem was "human error" because a tech didn't reboot the
>system. It's clearly operator error, not a problem with any systems at
>all. 
>
>  
>
I disagree - if the system were engineered properly, a reboot would not 
be necessary to keep the system from falling on it's face.

The article implied (though didn't outright state it) that the Unix 
systems did not include regular reboots.  I don't know enough about the 
engineering of the system to state whether this was caused by the app, 
the OS, or some dependancy issue.

But, in a critical system of this nature, relying on scheduled reboots 
for operation sends a signal to me that there's a problem in the system.

>Unfortunately, there is some truth in this. We (and not just the media)
>are starting to put blame on humans far too quickly. Is this justified?
>On one hand, they are only tools for us to do our job. On the other
>hand, they are products that we should be able to rely on. Who do we
>blame? Operators or products?
>
>
>  
>
That depends on the situation.  If a system can be engineered to operate 
properly on it's own, then it should be.  All else is operator error.  I 
think it most depends on the rationality of the automated requirement.

If the backup fails because said user forgets to change the backup 
tapes, then the problem is human error.
If the backup fails because said product doesn't properly flush its 
buffers and sends all data to /dev/null, then the issue is software 
error, even if it's a known condition that has had procedure put in 
place to work around it.  The argument for automation is rational and 
supposed to be in the system, and thus it's an error in the engineering.

The second scenario is similar to what we had here.  All a reboot does 
is ensure that the memory has been cleared.  If their developers don't 
know how to do this in code, or if they choose OS' that can't reliably 
do this, then either fire the developers and/or the decision makers, 
because they didn't do their jobs and people could have died because of 
that. 

             -Barry