[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AE846E8.1070303@gmail.com>
Date: Wed, 28 Oct 2009 14:28:08 +0100
From: Vedran Furač <vedran.furac@...il.com>
To: David Rientjes <rientjes@...gle.com>
CC: Hugh Dickins <hugh.dickins@...cali.co.uk>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
minchan.kim@...il.com, Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: Memory overcommit
David Rientjes wrote:
> On Wed, 28 Oct 2009, Vedran Furac wrote:
>
>>> This is wrong; it doesn't "emulate oom" since oom_kill_process() always
>>> kills a child of the selected process instead if they do not share the
>>> same memory. The chosen task in that case is untouched.
>> OK, I stand corrected then. Thanks! But, while testing this I lost X
>> once again and "test" survived for some time (check the timestamps):
>>
>> http://pastebin.com/d5c9d026e
>>
>> - It started by killing gkrellm(!!!)
>> - Then I lost X (kdeinit4 I guess)
>> - Then 103 seconds after the killing started, it killed "test" - the
>> real culprit.
>>
>> I mean... how?!
>>
>
> Here are the five oom kills that occurred in your log, and notice that the
> first four times it kills a child and not the actual task as I explained:
Yes, but four times wrong.
> Those are practically happening simultaneously with very little memory
> being available between each oom kill. Only later is "test" killed:
>
> [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child
> [97240.206832] Killed process 5005 (test)
>
> Notice how the badness score is less than 1/4th of the others. So while
> you may find it to be hogging a lot of memory, there were others that
> consumed much more.
^^^^^^^^^^^^^^^^^^^^^
This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty
(ignoring cache). Culprit then allocates all free memory (2GB). That
means it is using *more* than all other processes *together*. There
cannot be any other "that consumed much more".
> You can get a more detailed understanding of this by doing
>
> echo 1 > /proc/sys/vm/oom_dump_tasks
>
> before trying your testcase; it will show various information like the
> total_vm
Looking at total_vm (VIRT in top/vsize in ps?) is completely wrong. If I
sum up those numbers for every process running I would get:
%ps -eo pid,vsize,command|awk '{ SUM += $2} END {print SUM/1024/1024}'
14.7935
14GB. And I only have 3GB. I usually use exmap to get realistic numbers:
http://www.berthels.co.uk/exmap/doc.html
> and oom_adj value for each task at the time of oom (and the
> actual badness score is exported per-task via /proc/pid/oom_score in
> real-time). This will also include the rss and show what the end result
> would be in using that value as part of the heuristic on this particular
> workload compared to the current implementation.
Thanks, I'll try that... but I guess that using rss would yield better
results.
Regards,
Vedran
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists