[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DDB0B45.2080507@jp.fujitsu.com>
Date: Tue, 24 May 2011 10:35:01 +0900
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: rientjes@...gle.com
CC: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, caiqian@...hat.com, hughd@...gle.com,
kamezawa.hiroyu@...fujitsu.com, minchan.kim@...il.com,
oleg@...hat.com
Subject: Re: [PATCH 4/5] oom: don't kill random process
(2011/05/24 7:32), David Rientjes wrote:
> On Fri, 20 May 2011, KOSAKI Motohiro wrote:
>
>> CAI Qian reported oom-killer killed all system daemons in his
>> system at first if he ran fork bomb as root. The problem is,
>> current logic give them bonus of 3% of system ram. Example,
>> he has 16GB machine, then root processes have ~500MB oom
>> immune. It bring us crazy bad result. _all_ processes have
>> oom-score=1 and then, oom killer ignore process memory usage
>> and kill random process. This regression is caused by commit
>> a63d83f427 (oom: badness heuristic rewrite).
>>
>> This patch changes select_bad_process() slightly. If oom points == 1,
>> it's a sign that the system have only root privileged processes or
>> similar. Thus, select_bad_process() calculate oom badness without
>> root bonus and select eligible process.
>>
>
> You said earlier that you thought it was a good idea to do a proportional
> based bonus for root processes. Do you have a specific objection to
> giving root processes a 1% bonus for every 10% of used memory instead?
Because it's completely another topic. You have to maek another patch.
>> Also, this patch move finding sacrifice child logic into
>> select_bad_process(). It's necessary to implement adequate
>> no root bonus recalculation. and it makes good side effect,
>> current logic doesn't behave as the doc.
>>
>
> This is unnecessary and just makes the oom killer egregiously long. We
> are already diagnosing problems here at Google where the oom killer holds
> tasklist_lock on the readside for far too long, causing other cpus waiting
> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> disabled and it is spinning. A second tasklist scan is simply a
> non-starter.
>
> [ This is also one of the reasons why we needed to introduce
> mm->oom_disable_count to prevent a second, expensive tasklist scan. ]
You misunderstand the code. Both select_bad_process() and oom_kill_process()
are under tasklist_lock(). IOW, no change lock holding time.
>> Documentation/sysctl/vm.txt says
>>
>> oom_kill_allocating_task
>>
>> If this is set to non-zero, the OOM killer simply kills the task that
>> triggered the out-of-memory condition. This avoids the expensive
>> tasklist scan.
>>
>> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
>> This patch also fixes this issue.
>>
>
> oom_kill_allocating_task was introduced for SGI to prevent the expensive
> tasklist scan, the task that is actually allocating the memory isn't
> actually interesting and is usually random. This should be turned into a
> documentation fix rather than changing the implementation.
No benefit. I don't take it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists