[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B1D7B47.4000306@kernel.org>
Date: Mon, 07 Dec 2009 14:01:43 -0800
From: Yinghai Lu <yinghai@...nel.org>
To: Volker Lanz <vl@...ra.de>
CC: linux-kernel@...r.kernel.org, mingo@...e.hu
Subject: Re: [BISECTED, REGRESSION] Successful resume from suspend but freezes
after I/O
Volker Lanz wrote:
> On Monday 07 December 2009 20:23:08 Yinghai Lu wrote:
>> Volker Lanz wrote:
>>> On Monday 07 December 2009 19:24:02 Yinghai Lu wrote:
>>>> Volker Lanz wrote:
>>>>> Hi,
>>>>>
>>>>> updating to my distro's new 2.6.31 kernel on an x86_64 quad core
>>>>> machine with 6 GB of RAM I noticed resuming from suspend still worked
>>>>> as before, but the machine will now reproducably freeze (have to hard
>>>>> reset) afterwards as soon as I do something disk I/O heavy, though the
>>>>> problem is probably not related to disk activity at all.
>>>>>
>>>>> A current mainline 2.6.32 checkout shows the same behaviour.
>>>>>
>>>>> I git-bisected the problem to this commit:
>>>>>
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> -- ---- commit 78a8b35bc7abf8b8333d6f625e08c0f7cc1c3742
>>>>> Author: Yinghai Lu <yinghai@...nel.org>
>>>>> Date: Thu Mar 12 22:36:01 2009 -0700
>>>>>
>>>>> x86: make e820_update_range() handle small range update
>>>>>
>>>>> Impact: enhance e820 code to handle more cases
>>>>>
>>>>> Try to handle new range which could be covered by one entry.
>>>>>
>>>>> Signed-off-by: Yinghai Lu <yinghai@...nel.org>
>>>>> Cc: jbeulich@...ell.com
>>>>> LKML-Reference: <49B9F0C1.10402@...nel.org>
>>>>> Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>>>> -----------------------------------------------------------------------
>>>>> -- ----
>>>>>
>>>>>
>>>>> A kernel built from this revision does not boot, so the first booting
>>>>> kernel to show the problem actually seems to be:
>>>>>
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> -- ---- commit 6d7942dc2a70a7e74c352107b150265602671588
>>>>> Author: Yinghai Lu <yinghai@...nel.org>
>>>>> Date: Sat Mar 14 14:32:41 2009 -0700
>>>>>
>>>>> x86: fix 64k corruption-check
>>>>>
>>>>> Impact: fix boot crash
>>>>>
>>>>> Need to exit early if the addr is far above 64k.
>>>>>
>>>>> The crash got exposed by:
>>>>>
>>>>> 78a8b35: x86: make e820_update_range() handle small range update
>>>>>
>>>>> Signed-off-by: Yinghai Lu <yinghai@...nel.org>
>>>>> Cc: <stable@...nel.org>
>>>>> LKML-Reference: <49BC2279.2030101@...nel.org>
>>>>> Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>>>> -----------------------------------------------------------------------
>>>>> -- ----
>>>>>
>>>>>
>>>>> The last kernel to work without problems thus seems to be this one:
>>>>>
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> -- ---- commit 773e673de27297d07d852e7e9bfd1a695cae1da2
>>>>> Author: Yinghai Lu <yinghai@...nel.org>
>>>>> Date: Thu Mar 12 21:35:18 2009 -0700
>>>>>
>>>>> x86: fix e820_update_range()
>>>>>
>>>>> Impact: fix left range size on head
>>>>>
>>>>> | commit 5c0e6f035df983210e4d22213aed624ced502d3d
>>>>> | x86: fix code paths used by update_mptable
>>>>> | Impact: fix crashes under Xen due to unrobust e820 code
>>>>>
>>>>> fixes one e820 bug, but introduces another bug.
>>>>>
>>>>> Need to update size for left range at first in case it is header.
>>>>>
>>>>> also add __e820_add_region take more parameter.
>>>>>
>>>>> Signed-off-by: Yinghai Lu <yinghai@...nel.org>
>>>>> Cc: jbeulich@...ell.com
>>>>> LKML-Reference: <49B9E286.502@...nel.org>
>>>>> Signed-off-by: Ingo Molnar <mingo@...e.hu>
>>>>> -----------------------------------------------------------------------
>>>>> -- ----
>>>>>
>>>>>
>>>>> The problem is 100% reproducable on this machine: Resuming and then
>>>>> copying /usr/ to $HOME will freeze after a few hundred MB have been
>>>>> copied. Earlier kernels worked fine for the last couple of months.
>>>>>
>>>>> What additional information is required to help diagnose and hopefully
>>>>> fix the problem?
>>>> whole boot log with CONFIG_PCI_DEBUG and debug on command line.
>>> Here it is. It's huge, I hope you were expecting that...
>> and the one with current tip?
>>
>> http://people.redhat.com/mingo/tip.git/readme.txt
>
> With this kernel, the problem persists. Here's the log:
>
>
> -----------------------------------------------------------------------------
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 2.6.32-tip-02731-gd17424f (vl@...vor) (gcc
> version 4.4.1 (Ubuntu 4.4.1-4ubuntu8) ) #22 SMP Mon Dec 7 21:09:21 CET 2009
> [ 0.000000] Command line: root=UUID=160351ee-c9b0-4a72-9fd5-9962c8137a7e ro
> nosplash debug
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
> [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
> [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> [ 0.000000] BIOS-e820: 0000000000100000 - 00000000cfee0000 (usable)
> [ 0.000000] BIOS-e820: 00000000cfee0000 - 00000000cfee2000 (ACPI NVS)
> [ 0.000000] BIOS-e820: 00000000cfee2000 - 00000000cfef0000 (ACPI data)
> [ 0.000000] BIOS-e820: 00000000cfef0000 - 00000000cff00000 (reserved)
> [ 0.000000] BIOS-e820: 00000000e0000000 - 00000000e4000000 (reserved)
> [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> [ 0.000000] BIOS-e820: 0000000100000000 - 00000001b0000000 (usable)
> [ 0.000000] NX (Execute Disable) protection: active
> [ 0.000000] DMI 2.4 present.
> [ 0.000000] last_pfn = 0x1b0000 max_arch_pfn = 0x400000000
> [ 0.000000] MTRR default type: uncachable
> [ 0.000000] MTRR fixed ranges enabled:
> [ 0.000000] 00000-9FFFF write-back
> [ 0.000000] A0000-BFFFF uncachable
> [ 0.000000] C0000-CBFFF write-protect
> [ 0.000000] CC000-EFFFF uncachable
> [ 0.000000] F0000-FFFFF write-through
> [ 0.000000] MTRR variable ranges enabled:
> [ 0.000000] 0 base 000000000 mask F00000000 write-back
> [ 0.000000] 1 base 0E0000000 mask FE0000000 uncachable
> [ 0.000000] 2 base 0D0000000 mask FF0000000 uncachable
> [ 0.000000] 3 base 100000000 mask F00000000 write-back
> [ 0.000000] 4 base 1C0000000 mask FC0000000 uncachable
> [ 0.000000] 5 base 1B0000000 mask FF0000000 uncachable
> [ 0.000000] 6 base 0CFF00000 mask FFFF00000 uncachable
> [ 0.000000] 7 disabled
> [ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new
> 0x7010600070106
> [ 0.000000] e820 update range: 00000000cff00000 - 0000000100000000 (usable)
> ==> (reserved)
> [ 0.000000] last_pfn = 0xcfee0 max_arch_pfn = 0x400000000
> [ 0.000000] e820 update range: 0000000000001000 - 0000000000006000 (usable)
> ==> (reserved)
> [ 0.000000] Scanning 1 areas for low memory corruption
> [ 0.000000] modified physical RAM map:
> [ 0.000000] modified: 0000000000000000 - 0000000000001000 (usable)
> [ 0.000000] modified: 0000000000001000 - 0000000000006000 (reserved)
> [ 0.000000] modified: 0000000000006000 - 000000000009f800 (usable)
> [ 0.000000] modified: 000000000009f800 - 00000000000a0000 (reserved)
> [ 0.000000] modified: 00000000000f0000 - 0000000000100000 (reserved)
> [ 0.000000] modified: 0000000000100000 - 00000000cfee0000 (usable)
> [ 0.000000] modified: 00000000cfee0000 - 00000000cfee2000 (ACPI NVS)
> [ 0.000000] modified: 00000000cfee2000 - 00000000cfef0000 (ACPI data)
> [ 0.000000] modified: 00000000cfef0000 - 00000000cff00000 (reserved)
> [ 0.000000] modified: 00000000e0000000 - 00000000e4000000 (reserved)
> [ 0.000000] modified: 00000000fec00000 - 0000000100000000 (reserved)
> [ 0.000000] modified: 0000000100000000 - 00000001b0000000 (usable)
can you try to disable
CONFIG_X86_CHECK_BIOS_CORRUPTION
and
CONFIG_X86_RESERVE_LOW_64K
YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists