lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 19 Jul 2008 12:37:11 +0200 From: Andi Kleen <andi@...stfloor.org> To: Russ Anderson <rja@....com> Cc: mingo@...e.hu, tglx@...utronix.de, Tony Luck <tony.luck@...el.com>, linux-kernel@...r.kernel.org, linux-ia64@...r.kernel.org Subject: Re: [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7) Russ Anderson <rja@....com> writes: > [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7) FWIW I discussed this with some hardware people and the general opinion was that it was way too aggressive to disable a page on the first corrected error like this patchkit currently does. The corrected bit error could be caused by a temporary condition e.g. in the DIMM link, and does not necessarily mean that part of the DIMM is really going bad. Permanently disabling would only be justified if you saw repeated corrected errors over a long time from the same DIMM. There are also some potential scenarios where being so aggressive could hurt, e.g. if you have a low rate of random corrected events spread randomly all over your memory (e.g. with a flakey DIMM connection) after a long enough uptime you could lose significant parts of your memory even though the DIMM is actually still ok. Also the other issue that if the DIMM is going bad then it's likely larger areas than just the lines making up this page. So you would still risk uncorrected errors anyways because disabling the page would only cover a small subset of the affected area. If you really wanted to do this you probably should hook it up to mcelog's (or the IA64 equivalent) DIMM database and then control it from user space with suitable large thresholds and DIMM specific knowledge. But it's unlikely it can be really done nicely in a way that is isolated from very specific knowledge about the underlying memory configuration. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists