lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1599.1186974562@turing-police.cc.vt.edu>
Date:	Sun, 12 Aug 2007 23:09:22 -0400
From:	Valdis.Kletnieks@...edu
To:	Folkert van Heusden <folkert@...heusden.com>
Cc:	roland <devzero@....de>, linux-kernel@...r.kernel.org
Subject: Re: Software based ECC ?

On Sun, 12 Aug 2007 18:51:31 +0200, Folkert van Heusden said:

> a question and an idea: Q: is ecc guaranteed to detect all bitflips?

It depends on the exact ECC function the hardware implements.  Usually it
provides performance such as:

"Correct all 1-bit errors. Detect all 2-bit errors, and most 3 and higher,
but not correct".

(Of course, "correct all 1 or 2 bit and detect all 3 bit" can be done, it
just takes more bits of ECC.)

> Idea: what about a multicore system (3 or more) that runs the same
> processes on 2 cores and a third core verifying that they both do the
> same? As I think it is not only ram that can become faulty.

This is actually done for high-reliability systems (Google for "tell me twice"
and "tell me three times").  The problem is that it takes a lot of extra
hardware.  The G5 and later IBM Z-series mainframe chipsets (not to be confused with
the PowerPC G5) implemented dual computation units and a comparator that
signals a 'Machine Check' condition if the two CPUs don't end up in the
same exact state (as an added bonus, at the end of each instruction that
both *do* compare good, it latches the *entire* state of the CPU out,
and then does the following:

1) Retry the instruction on the same CPU - if it compares correctly, keep
going and flag a "soft" error.

2) If it still fails, read out the last "known good" status latch, and load
it into a spare CPU, and fire it up, and flag the failing one as bad.

http://www.research.ibm.com/journal/rd/435/spainhower.pdf
http://www.research.ibm.com/journal/rd/435/mueller.pdf

These guys have forgotten more about designing highly reliable systems than
most of us will ever know. ;)

Needless to say, not everybody is willing to pay the costs of the hardware
overhead of this approach.  


Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ