lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090131134327.GB28763@khazad-dum.debian.net>
Date:	Sat, 31 Jan 2009 11:43:27 -0200
From:	Henrique de Moraes Holschuh <hmh@....eng.br>
To:	Tim Small <tim@...tersideup.com>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	ncunningham-lkml@...a.org.au, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Chris Friesen <cfriesen@...tel.com>,
	Pavel Machek <pavel@...e.cz>, Doug Thompson <norsk5@...oo.com>,
	bluesmoke-devel@...ts.sourceforge.net,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: marching through all physical memory in software

On Sat, 31 Jan 2009, Tim Small wrote:
> Eric W. Biederman wrote:
> > At the point we are talking about software scrubbing it makes sense to assume
> > a least common denominator memory controller, one that does not do automatic
> > write-back of the corrected value, as all of the recent memory controllers
> > do scrubbing in hardware.
> >   
> 
> I was just trying to clarify the distinction between the two processes 
> which have similar names, but aren't (IMO) actually that similar:
> 
> "Software Scrubbing"
> 
> Triggering a read, and subsequent rewrite of a particular RAM location 
> which has suffered a correctable ECC error(s) i.e. hardware detects an 
> error, then the OS takes care of the rewrite to "scrub" the error in the 
> case that the hardware doesn't handle this automatically.
> 
> This should be a very-occasional error-path process, and performance is 
> probably not critical..
> 
> 
> "Background Scrubbing"
> 
> . This is a poor name, IMO (scrub infers some kind of write to me), 
> which applies to a process whereby you ensure that the ECC check-bits 
> are verified periodically for the whole of physical RAM, so that single 
> bit errors in a given ECC block don't accumulate and turn into 
> uncorrectable errors.  It may also lead to improved data collection for 
> some failure modes.  Again, many memory controllers implement this 
> feature in hardware, so we shouldn't do it twice where this is supported.

It is implined in the background scrubbing, that if a background scrub
page read causes an ECC correctable error to be flagged, the normal
"fix through scrub" behaviour of the memory controller will be
triggered (possibly, the software scrubbing described above).

And if an uncorretable error is detected during the scrub, we have to
do something about it as well.  And that won't be that easy: locate
whatever process is using that page, and so something smart to it...
or do some emergency evasive actions if it is one of the kernel's data
scructures, etc.

So, as you said, "background scrubbing" and "software scrubbing" really are
very different things, and one has to expect that background scrubbing will
eventually trigger software scrubbing, major system emergency handling
(uncorrectable errors in kernel memory) or minor system emergency
handling (uncorrectable errors in process memory).

> There is (AFAIK) no need to do any writes here, and in fact doing so is 

One might want the possibility of doing inconditional writes, because
it helps with memory bitrot on crappy hardware where the refresh
cycles aren't enough to avoid bitrot.  But you definately won't want
it most of the time.

You can also implement software-based ECC using a background scrubber
and setting aside pages to store the ECC information.  Now, THAT is
probably not worth bothering with due to the performance impact, but
who knows...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ