[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0910272047430.8988@chino.kir.corp.google.com>
Date: Tue, 27 Oct 2009 21:08:56 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: vedran.furac@...il.com
cc: Hugh Dickins <hugh.dickins@...cali.co.uk>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
minchan.kim@...il.com, Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: Memory overcommit
On Wed, 28 Oct 2009, Vedran Furac wrote:
> > This is wrong; it doesn't "emulate oom" since oom_kill_process() always
> > kills a child of the selected process instead if they do not share the
> > same memory. The chosen task in that case is untouched.
>
> OK, I stand corrected then. Thanks! But, while testing this I lost X
> once again and "test" survived for some time (check the timestamps):
>
> http://pastebin.com/d5c9d026e
>
> - It started by killing gkrellm(!!!)
> - Then I lost X (kdeinit4 I guess)
> - Then 103 seconds after the killing started, it killed "test" - the
> real culprit.
>
> I mean... how?!
>
Here are the five oom kills that occurred in your log, and notice that the
first four times it kills a child and not the actual task as I explained:
[97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child
[97137.725017] Killed process 21503 (VirtualBox)
[97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child
[97137.864656] Killed process 11142 (klauncher)
[97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child
[97137.888180] Killed process 11151 (ksmserver)
[97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child
[97137.972888] Killed process 11224 (audacious2)
Those are practically happening simultaneously with very little memory
being available between each oom kill. Only later is "test" killed:
[97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
[97240.206832] Killed process 5005 (test)
Notice how the badness score is less than 1/4th of the others. So while
you may find it to be hogging a lot of memory, there were others that
consumed much more.
You can get a more detailed understanding of this by doing
echo 1 > /proc/sys/vm/oom_dump_tasks
before trying your testcase; it will show various information like the
total_vm and oom_adj value for each task at the time of oom (and the
actual badness score is exported per-task via /proc/pid/oom_score in
real-time). This will also include the rss and show what the end result
would be in using that value as part of the heuristic on this particular
workload compared to the current implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists