linux-kernel - Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTim=N=8G+Q9HJ6BaMO8L3oZouanxvtsf99fVxYGquTewDg@mail.gmail.com>
Date:	Wed, 22 Jun 2011 11:11:51 -0700
From:	Nancy Yuen <yuenn@...gle.com>
To:	Randy Dunlap <rdunlap@...otime.net>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Stefan Assmann <sassmann@...nic.de>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, tony.luck@...el.com,
	andi@...stfloor.org, mingo@...e.hu, hpa@...or.com,
	rick@...rein.org, Michael Ditto <mditto@...gle.com>
Subject: Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)

I haven't had time to submit the patches, though it's on my todo list.

----------
Nancy



On Wed, Jun 22, 2011 at 11:09, Randy Dunlap <rdunlap@...otime.net> wrote:
> On Wed, 22 Jun 2011 11:00:34 -0700 Andrew Morton wrote:
>
>> On Wed, 22 Jun 2011 13:18:51 +0200 Stefan Assmann <sassmann@...nic.de> wrote:
>>
>> > Following the RFC for the BadRAM feature here's the updated version with
>> > spelling fixes, thanks go to Randy Dunlap. Also the code is now less verbose,
>> > as requested by Andi Kleen.
>> > v2 with even more spelling fixes suggested by Randy.
>> > Patches are against vanilla 2.6.39.
>> >
>> > The idea is to allow the user to specify RAM addresses that shouldn't be
>> > touched by the OS, because they are broken in some way. Not all machines have
>> > hardware support for hwpoison, ECC RAM, etc, so here's a solution that allows to
>> > use bitmasks to mask address patterns with the new "badram" kernel command line
>> > parameter.
>> > Memtest86 has an option to generate these patterns since v2.3 so the only thing
>> > for the user to do should be:
>> > - run Memtest86
>> > - note down the pattern
>> > - add badram=<pattern> to the kernel command line
>> >
>> > The concerning pages are then marked with the hwpoison flag and thus won't be
>> > used by the memory managment system.
>>
>> The google kernel has a similar capability.  I asked Nancy to comment
>> on these patches and she said:
>>
>> : One, the bad addresses are passed via the kernel command line, which
>> : has a limited length.  It's okay if the addresses can be fit into a
>> : pattern, but that's not necessarily the case in the google kernel.  And
>> : even with patterns, the limit on the command line length limits the
>> : number of patterns that user can specify.  Instead we use lilo to pass
>> : a file containing the bad pages in e820 format to the kernel.
>> :
>> : Second, the BadRAM patch expands the address patterns from the command
>> : line into individual entries in the kernel's e820 table.  The e820
>> : table is a fixed buffer that supports a very small, hard coded number
>> : of entries (128).  We require a much larger number of entries (on
>> : the order of a few thousand), so much of the google kernel patch deals
>> : with expanding the e820 table. Also, with the BadRAM patch, entries
>> : that don't fit in the table are silently dropped and this isn't
>> : appropriate for us.
>> :
>> : Another caveat of mapping out too much bad memory in general.  If too
>> : much memory is removed from low memory, a system may not boot.  We
>> : solve this by generating good maps.  Our userspace tools do not map out
>> : memory below a certain limit, and it verifies against a system's iomap
>> : that only addresses from memory is mapped out.
>>
>> I have a couple of thoughts here:
>>
>> - If this patchset is merged and a major user such as google is
>>   unable to use it and has to continue to carry a separate patch then
>>   that's a regrettable situation for the upstream kernel.
>>
>> - Google's is, afaik, the largest use case we know of: zillions of
>>   machines for a number of years.  And this real-world experience tells
>>   us that the badram patchset has shortcomings.  Shortcomings which we
>>   can expect other users to experience.
>>
>> So.  What are your thoughts on these issues?
>
>
> Good comments, so where is google's patch submittal?
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/