lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 03 Oct 2006 10:40:31 +0200
From:	Martin Devera <devik@....cz>
To:	Andrew Morton <akpm@...l.org>
CC:	linux-kernel@...r.kernel.org
Subject: Re: stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18

Andrew Morton wrote:
> On Wed, 27 Sep 2006 09:55:47 +0200
> Martin Devera <devik@....cz> wrote:
> 
>> Hello,
>>
>> I have 2way Opteron machine. I've done this:
>> echo 0 > /sys/devices/system/cpu/cpu1/online
>>
>> and then strace stat /proc:
>>
>> [snip]
>> personality(PER_LINUX)                  = 4194304
>> getpid()                                = 14926
>> brk(0)                                  = 0x804b000
>> brk(0x804b1a0)                          = 0x804b1a0
>> brk(0x804c000)                          = 0x804c000
>> stat("/proc", 0xbf8e7490)               = -1 EOVERFLOW
>>
>> When I do echo 1 > ... to start cpu again then the stat starts
>> to work again ... Weird.

Hello,
I just want to make more info public. It seems that the problem is deeper.
The 2.6.18 kernel crashed the machine 4 times till now. Symptoms are - working
net, ssh was functional but I was not able to run single binary except "cat",
others giving me permission denied of Bus error.
I was doing no experiments with cpu hotplug this time. The machine was up
with 2.6.17.1 for six months and no problems.
Also I found weird errors like tg3 watchdog timeout, sata read errors (on all
sectors) etc. on console. Seems like memory corruption to me. It is worth to
note that the lockup always occured after high load.
We use MSI Far2 dual opteron MoBo.

All related info is at http://luxik.cdi.cz/~devik/files/2618-corrupt/ along
with 2.6.17.1 config (for comparison).
The main problem is that I have no similar server to simulate the problem
off-site. Thus take this report mainly as informative, I hope to replace
the server in a few weeks to investigate it more. For now we are back on
2.6.17.1.

Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ