[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20060907172116.452ff45d.akpm@osdl.org>
Date: Thu, 7 Sep 2006 17:21:16 -0700
From: Andrew Morton <akpm@...l.org>
To: "Om Narasimhan" <om.turyx@...il.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: what is the expected behaviour under extreme high load.
On Wed, 6 Sep 2006 16:53:55 -0700
"Om Narasimhan" <om.turyx@...il.com> wrote:
> Hi,
> I am running a stress test on my SunFire 4600 (8x2core, 64G) using the
> mem_test available from
> http://carpanta.dc.fi.udc.es/~quintela/memtest. I am using SuSE
> enterprise 9 SP3.
>
> I am wondering what is the expected behaviour of a machine under
> extreme VM stress.
> When I stress the system to the limits, it practically becomes
> unresponsive. It runs for almost half an hour and then it crashes
> because of a CPU lockup.
>
> Any pointers from where I can start debugging this issue?
It's an old kernel. Newer ones might be better.
> >From top,
> load average: 190.70, 132.45, 60.43
> # cat /proc/sys/vm/overcommit_memory
> 0
>
> The final crashing message on the ser. console is,
>
> ...
>
> Free swap: 0kB
It ran out of swapspace. oom-killing is the correct behaviour.
> 16777208 pages of RAM
> 430115 reserved pages
> 69705619 pages shared
> 32 pages swap cached
> Out of Memory: Killed process 12726 (mtest).
> NMI Watchdog detected LOCKUP on CPU14, registers:
> CPU 14
> Pid: 12756, comm: mtest Tainted: G U (2.6.5-7.244-smp-dbg )
> RIP: 0010:[<ffffffff80180342>] <ffffffff80180342>{.text.lock.vmscan+18}
> RSP: 0000:0000010dfbd31a38 EFLAGS: 00000082
> RAX: 0000010603945a10 RBX: 0000000000000000 RCX: 0000010601cb7128
> RDX: 0000010dfbd31b48 RSI: 0000010600004408 RDI: 0000000000000000
> RBP: 0000010600004380 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000064 R11: 0000000000000080 R12: 0000010600004380
> R13: 0000010600004400 R14: 0000010dfbd31b48 R15: 0000010dfbd31b28
> FS: 0000002a95894b00(0000) GS:ffffffff80610c80(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000002ae2f57010 CR3: 00000001fff2d000 CR4: 00000000000006e0
> Process mtest (pid: 12756, threadinfo 0000010dfbd30000, task 0000010dfede0280)
> Stack: 0000000000000000 0000000100000000 0000010dfbd31c08 0000008000000246
> 0000000000000246 0000000000000246 0000010dfede0280 0000000000000286
> 0000010975a5ad30 ffffffff805353e0
> Call Trace:<ffffffff8017f5c2>{shrink_zone+226}
> <ffffffff8017fed8>{try_to_free_pages+280}
> <ffffffff80140cb0>{autoremove_wake_function+0}
> <ffffffff80174a0f>{__alloc_pages+559}
> <ffffffff80185393>{handle_mm_fault+2323}
> <ffffffff80123e80>{do_page_fault+0}
> <ffffffff80124054>{do_page_fault+468} <ffffffff8013be17>{thread_return+0}
> <ffffffff80111465>{error_exit+0}
>
That isn't. Could be that the scanner simply spent too long with
interrupts disabled. Or some interaction with the oom-killer caused the
scanner to lock up.
But that's an awfully old kernel - it could be that whatever problem caused
this has been fixed in the intervening years.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists