lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080502185030.GH7246@sgi.com>
Date:	Fri, 2 May 2008 13:50:30 -0500
From:	Russ Anderson <rja@....com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	linux-kernel@...r.kernel.org, linux-ia64@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tony Luck <tony.luck@...el.com>,
	Christoph Lameter <clameter@....com>
Subject: Re: [PATCH 3/3] ia64: Call migration code on correctable errors v2

On Fri, May 02, 2008 at 07:45:55PM +0200, Andi Kleen wrote:
> Russ Anderson <rja@....com> writes:
> 
> > Migrate data off pages with correctable memory errors.  This patch is the 
> > ia64 specific piece.  It connects the CPE handler to the page migration
> > code.  It is implemented as a kernel loadable module, similar to the mca
> > recovery code (mca_recovery.ko).  This allows the feature to be turned off
> > by uninstalling the module.  Creates /proc/badram to display bad page
> > information and free bad pages.
> 
> How do you know what pages have excessive errors? And how is excessive defined?
> Surely you don't keep  a per page error count? It's unclear from your patch. 

The code migrates on the first correctable error on a page, so "excessive"
is one.  Yes, keeping a per page count gets problematic, especially as
memories get larger.  The issue of the right metric for when to migrate
will always be debatable and the "right" answer likely will depend on
the physical memory technology.

> Anyways I don't think this should be ia64 specific, but generic code.

The actual migration code is generic, in mm/migrate.c.

The ia64 kernel module piece is very arch specific.  It ties into the 
ia64 CPE handler.  It gets the page address from the CPE record, based
on the ia64 error handling spec.  Each arch will have a different way of
determining the physical address of the correctable error (for example).

> I also have my doubts about making such small code subsystems modules. Modules
> always get rounded to pages so it ultimatively just wastes memory.

CONFIG_IA64_CPE_MIGRATE=m builds it as module.
CONFIG_IA64_CPE_MIGRATE=y builds it as part of the kernel.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@....com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ