lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 May 2011 22:38:24 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Christian Kujau <lists@...dbynature.de>
Cc:	Markus Trippelsdorf <markus@...ppelsdorf.de>,
	LKML <linux-kernel@...r.kernel.org>, xfs@....sgi.com,
	minchan.kim@...il.com
Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks

On Mon, May 02, 2011 at 02:26:17AM -0700, Christian Kujau wrote:
> On Sun, 1 May 2011 at 18:01, Dave Chinner wrote:
> > I really don't know why the xfs inode cache is not being trimmed. I
> > really, really need to know if the XFS inode cache shrinker is
> > getting blocked or not running - do you have those sysrq-w traces
> > when near OOM I asked for a while back?
> 
> Here's another attempt at getting those:
> 
>   http://nerdbynature.de/bits/2.6.39-rc4/oom/
>   * messages-11.txt.gz & slabinfo-11.txt.bz2
>     - oom-killer at 00:05:04
>     - last sysrq-w to succeed at 00:05:03
> 
>   * messages-12.txt.gz & slabinfo-12.txt.bz2, along
>     with meminfo-post-oom-12.txt & sysrq-w_post-oom-12.jpg could
>     be more interesting:
>     - last sysrq-w to succeed at 01:27:08
>     - oom-killer at 01:27:11
> 
>    ...but after the OOM-killer was killing quite a few processes, MemFree
>    showed 511236 kB free memory, yet ssh logins were still being killed.
>    Finally I got a root shell on the box, issued sysrq-w again and even
>    executed /bin/sync, which came back. But looking at the logs now 
>    nothing went to the disk (/var/log resides on / which is a ext4 fs).
>    See sysrq-w_post-oom-12.jpg for a sysrq-w I took 2381s after boot time,
>    or 01:32 - syslog stopped on 01:27.

Same problem:

MemFree:          511236 kB
....
LowTotal:         759904 kB
LowFree:            3804 kB

i.e. that low memory is being exhausted by the slab cache, while
there is lots of free high memory, and the low memory zone is marked
as all unreclaimable....

The sysrq trace less than 1s before the first OOM shows this:

[c00770ec] __lock_acquire+0x43c/0x1818 (unreliable)
[c000a924] __switch_to+0x9c/0x128
[c0417580] schedule+0x274/0x8bc
[c0418128] schedule_timeout+0x16c/0x214
[c04172a0] io_schedule_timeout+0xb0/0x11c
[c00b153c] congestion_wait+0x8c/0xdc
[c00aa43c] kswapd+0x6d0/0x884
[c005e3d0] kthread+0x84/0x88
[c0010908] kernel_thread+0x4c/0x68

Background memory reclaim appears to be blocked by IO congestion....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ