[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48B7A377.8010205@goop.org>
Date: Fri, 29 Aug 2008 00:21:27 -0700
From: Jeremy Fitzhardinge <jeremy@...p.org>
To: Ingo Molnar <mingo@...e.hu>
CC: Rafał Miłecki <zajec5@...il.com>,
Alan Jenkins <alan-jenkins@...fmail.co.uk>,
Hugh Dickens <hugh@...itas.com>,
"H. Peter Anvin" <hpa@...or.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] x86: check for and defend against BIOS memory corruption
Ingo Molnar wrote:
> * Rafał Miłecki <zajec5@...il.com> wrote:
>
>
>> 2008/8/28 Jeremy Fitzhardinge <jeremy@...p.org>:
>>
>>> Some BIOSes have been observed to corrupt memory in the low 64k. This
>>> patch does two things:
>>> - Reserves all memory which does not have to be in that area, to
>>> prevent it from being used as general memory by the kernel. Things
>>> like the SMP trampoline are still in the memory, however.
>>> - Clears the reserved memory so we can observe changes to it.
>>> - Adds a function check_for_bios_corruption() which checks and reports on
>>> memory becoming unexpectedly non-zero. Currently it's called in the
>>> x86 fault handler, and the powermanagement debug output.
>>>
>>> RFC: What other places should we check for corruption in?
>>>
>>> [ Alan, Rafał: could you check you see:
>>> 1: corruption messages
>>> 2: no crashes
>>> Thanks -J
>>> ]
>>>
>> I was trying my best to crash system with this patch applied and failed :)
>>
>> Works great.
>>
>> Just wonder if I should expect any printk from
>> check_for_bios_corruption? I do not see any:
>>
>> zajec@...y:~> dmesg | grep -i corr
>> scanning 2 areas for BIOS corruption
>>
>
> that's _very_ weird.
>
No, it's expected. Rafał only got corruption when plugging his HDMI
cable, and I didn't put any corruption checks on that path (I'm not even
sure what kernel code would get executed in that case). Hugh's original
patch put a check in the hot path of the fault handler - and so it would
get called regularly - but I put it in the kernel-bug path, which is
fairly pointless given that we expect this patch to prevent the crashes.
It does, however, do the check in the pm state changes, so doing a
suspend should make it print some of the corruption it found. Alan's
case would be a better test for that though.
It does raise the question of where the good places to put the check
are. It shouldn't be too hot, given that it's scanning ~64k of memory,
but often enough to actually show something. I was thinking of putting
some calls in the acpi code itself, but got, erm, discouraged.
Maybe hooking into a sysrq key would be useful (sysrq-m?).
> maybe the BIOS expects _zeroes_ somewhere? Do you suddenly see crashes
> if you change this line in Jeremy's patch:
>
> + memset(__va(addr), 0, size);
>
> to something like:
>
> + memset(__va(addr), 0x55, size);
>
> If this does not tickle any messages either, then maybe the problem is
> in the identity of the entities we allocate in the first 64K. Is there a
> list of allocations that go there when Jeremy's patch is not applied?
>
> but ... i think with an earlier patch you saw corruption, right?
> Far-fetched idea: maybe it's some CPU erratum during suspend/resume that
> corrupts pagetables if the pagetables are allocated in the first 64K of
> RAM? In that case we should use a bootmem allocation for pagetables that
> give a minimum address of 64K.
>
Rafał's corruption was definitely non-zero. I think the corruption is
happening, but it's just not reported.
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists