[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110502123824.GB2978@dastard>
Date: Mon, 2 May 2011 22:38:24 +1000
From: Dave Chinner <david@...morbit.com>
To: Christian Kujau <lists@...dbynature.de>
Cc: Markus Trippelsdorf <markus@...ppelsdorf.de>,
LKML <linux-kernel@...r.kernel.org>, xfs@....sgi.com,
minchan.kim@...il.com
Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks
On Mon, May 02, 2011 at 02:26:17AM -0700, Christian Kujau wrote:
> On Sun, 1 May 2011 at 18:01, Dave Chinner wrote:
> > I really don't know why the xfs inode cache is not being trimmed. I
> > really, really need to know if the XFS inode cache shrinker is
> > getting blocked or not running - do you have those sysrq-w traces
> > when near OOM I asked for a while back?
>
> Here's another attempt at getting those:
>
> http://nerdbynature.de/bits/2.6.39-rc4/oom/
> * messages-11.txt.gz & slabinfo-11.txt.bz2
> - oom-killer at 00:05:04
> - last sysrq-w to succeed at 00:05:03
>
> * messages-12.txt.gz & slabinfo-12.txt.bz2, along
> with meminfo-post-oom-12.txt & sysrq-w_post-oom-12.jpg could
> be more interesting:
> - last sysrq-w to succeed at 01:27:08
> - oom-killer at 01:27:11
>
> ...but after the OOM-killer was killing quite a few processes, MemFree
> showed 511236 kB free memory, yet ssh logins were still being killed.
> Finally I got a root shell on the box, issued sysrq-w again and even
> executed /bin/sync, which came back. But looking at the logs now
> nothing went to the disk (/var/log resides on / which is a ext4 fs).
> See sysrq-w_post-oom-12.jpg for a sysrq-w I took 2381s after boot time,
> or 01:32 - syslog stopped on 01:27.
Same problem:
MemFree: 511236 kB
....
LowTotal: 759904 kB
LowFree: 3804 kB
i.e. that low memory is being exhausted by the slab cache, while
there is lots of free high memory, and the low memory zone is marked
as all unreclaimable....
The sysrq trace less than 1s before the first OOM shows this:
[c00770ec] __lock_acquire+0x43c/0x1818 (unreliable)
[c000a924] __switch_to+0x9c/0x128
[c0417580] schedule+0x274/0x8bc
[c0418128] schedule_timeout+0x16c/0x214
[c04172a0] io_schedule_timeout+0xb0/0x11c
[c00b153c] congestion_wait+0x8c/0xdc
[c00aa43c] kswapd+0x6d0/0x884
[c005e3d0] kthread+0x84/0x88
[c0010908] kernel_thread+0x4c/0x68
Background memory reclaim appears to be blocked by IO congestion....
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists