[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <452221FF.70405@cdi.cz>
Date: Tue, 03 Oct 2006 10:40:31 +0200
From: Martin Devera <devik@....cz>
To: Andrew Morton <akpm@...l.org>
CC: linux-kernel@...r.kernel.org
Subject: Re: stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18
Andrew Morton wrote:
> On Wed, 27 Sep 2006 09:55:47 +0200
> Martin Devera <devik@....cz> wrote:
>
>> Hello,
>>
>> I have 2way Opteron machine. I've done this:
>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>
>> and then strace stat /proc:
>>
>> [snip]
>> personality(PER_LINUX) = 4194304
>> getpid() = 14926
>> brk(0) = 0x804b000
>> brk(0x804b1a0) = 0x804b1a0
>> brk(0x804c000) = 0x804c000
>> stat("/proc", 0xbf8e7490) = -1 EOVERFLOW
>>
>> When I do echo 1 > ... to start cpu again then the stat starts
>> to work again ... Weird.
Hello,
I just want to make more info public. It seems that the problem is deeper.
The 2.6.18 kernel crashed the machine 4 times till now. Symptoms are - working
net, ssh was functional but I was not able to run single binary except "cat",
others giving me permission denied of Bus error.
I was doing no experiments with cpu hotplug this time. The machine was up
with 2.6.17.1 for six months and no problems.
Also I found weird errors like tg3 watchdog timeout, sata read errors (on all
sectors) etc. on console. Seems like memory corruption to me. It is worth to
note that the lockup always occured after high load.
We use MSI Far2 dual opteron MoBo.
All related info is at http://luxik.cdi.cz/~devik/files/2618-corrupt/ along
with 2.6.17.1 config (for comparison).
The main problem is that I have no similar server to simulate the problem
off-site. Thus take this report mainly as informative, I hope to replace
the server in a few weeks to investigate it more. For now we are back on
2.6.17.1.
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists