linux-kernel - Re: [PATCH] [0/16] POISON: Intro

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090408062107.GE17934@one.firstfloor.org>
Date:	Wed, 8 Apr 2009 08:21:07 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH] [0/16] POISON: Intro

On Tue, Apr 07, 2009 at 10:47:09PM -0700, Andrew Morton wrote:
> On Tue,  7 Apr 2009 17:09:56 +0200 (CEST) Andi Kleen <andi@...stfloor.org> wrote:
> 
> > Upcoming Intel CPUs have support for recovering from some memory errors. This
> > requires the OS to declare a page "poisoned", kill the processes associated
> > with it and avoid using it in the future. This patchkit implements
> > the necessary infrastructure in the VM.
> 
> Seems that this feature is crying out for a testing framework (perhaps
> it already has one?). 

Multiple ones in fact.

One of them is 

git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
(test suite covering various cases)

git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
(injector using the x86 specific error injection hooks I posted
earlier)

Then i have some tests using the madvise MADV_POISON hook
(which tests the various cases from a process stand points
and recovers). This is still a little hackish, but if there's
interest I can put it out. It has at least one test case
that is known to hang (non linear mappings), still looking
at that.

Long term plan was to put both mce-test above and the
MADV_POISON test into LTP.

And a few random hacks. But coverage is still not 100%

> A simplistic approach would be

Random kill anywhere is hard to test because your system will
die regularly and randomly. mce-test.git does some automated
testing of fatal errors by catching them using kexec, but we haven't
tried that for full recovery.

> 
> 	echo some-pfn > /proc/bad-pfn-goes-here
> 
> A slightly more sophisticated version might do the deed from within a
> timer interrupt, just to get a bit more coverage.

mce-test/inject does it from other CPUs with smp_function_call_single,
so it's really relatively random. I've considered to use NMIs too,
but at least the high level recovery code synchronizes first
to work queue context anyways, so it doesn't buy us too much for that.

-Andi
-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/