[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <E0975165-4185-47A9-A15F-B46774A5F6DA@gmail.com>
Date: Mon, 15 Feb 2010 00:43:02 +0100
From: Anton Starikov <ant.starikov@...il.com>
To: linux-kernel@...r.kernel.org
Subject: 2.6.31 and OOM killer = bug?
Hi,
The setup:
is 16-core opteron node, diskless with NFS root, swapless, 64GB of RAM. Operating under OpenSUSE 11.2. With kernel version 2.6.31. Although it isn't vanilla, I think probably more right is to submit this into LKML.
The problem:
On this node user run MPI job with 16 processes, local job by using shared memory communication.
At some point this processes are trying to use more memory that available.
Normally, all of them or part of them would be killed by OOM killer, and it use to work for years over many versions of kernel.
Now, with fresh setup I got something new. OOM tried to kill, but didn't succeed, and even more, brought system in unusable state. All those processes are locked and un-killable. some of other processes are also locked and un-killable/inaccessible. kswapd consume 100% CPU (which I think is expected behavior when there is no free memory).
No free memory obviously, cause all original processes are still in memory.
I tried to test OOM behavior and it always happens like that now.
Here I attach full gzipped log of all related information captured by logserver (sent by logserver and netconsole, so it can be partly doubled). Sorry that it is too big, but I didn't know what information can be important.
Anton.
Download attachment "fixedlog.txt.gz" of type "application/x-gzip" (39050 bytes)
Powered by blists - more mailing lists