lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0910272047430.8988@chino.kir.corp.google.com>
Date:	Tue, 27 Oct 2009 21:08:56 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	vedran.furac@...il.com
cc:	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	minchan.kim@...il.com, Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: Memory overcommit

On Wed, 28 Oct 2009, Vedran Furac wrote:

> > This is wrong; it doesn't "emulate oom" since oom_kill_process() always 
> > kills a child of the selected process instead if they do not share the 
> > same memory.  The chosen task in that case is untouched.
> 
> OK, I stand corrected then. Thanks! But, while testing this I lost X
> once again and "test" survived for some time (check the timestamps):
> 
> http://pastebin.com/d5c9d026e
> 
> - It started by killing gkrellm(!!!)
> - Then I lost X (kdeinit4 I guess)
> - Then 103 seconds after the killing started, it killed "test" - the
> real culprit.
> 
> I mean... how?!
> 

Here are the five oom kills that occurred in your log, and notice that the 
first four times it kills a child and not the actual task as I explained:

[97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child
[97137.725017] Killed process 21503 (VirtualBox)
[97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child
[97137.864656] Killed process 11142 (klauncher)
[97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child
[97137.888180] Killed process 11151 (ksmserver)
[97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child
[97137.972888] Killed process 11224 (audacious2)

Those are practically happening simultaneously with very little memory 
being available between each oom kill.  Only later is "test" killed:

[97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
[97240.206832] Killed process 5005 (test)

Notice how the badness score is less than 1/4th of the others.  So while 
you may find it to be hogging a lot of memory, there were others that 
consumed much more.

You can get a more detailed understanding of this by doing

	echo 1 > /proc/sys/vm/oom_dump_tasks

before trying your testcase; it will show various information like the 
total_vm and oom_adj value for each task at the time of oom (and the 
actual badness score is exported per-task via /proc/pid/oom_score in 
real-time).  This will also include the rss and show what the end result 
would be in using that value as part of the heuristic on this particular 
workload compared to the current implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ