linux-kernel - Re: [PATCH 4/5] oom: don't kill random process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 24 May 2011 10:35:01 +0900
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	rientjes@...gle.com
CC:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, caiqian@...hat.com, hughd@...gle.com,
	kamezawa.hiroyu@...fujitsu.com, minchan.kim@...il.com,
	oleg@...hat.com
Subject: Re: [PATCH 4/5] oom: don't kill random process

(2011/05/24 7:32), David Rientjes wrote:
> On Fri, 20 May 2011, KOSAKI Motohiro wrote:
>
>> CAI Qian reported oom-killer killed all system daemons in his
>> system at first if he ran fork bomb as root. The problem is,
>> current logic give them bonus of 3% of system ram. Example,
>> he has 16GB machine, then root processes have ~500MB oom
>> immune. It bring us crazy bad result. _all_ processes have
>> oom-score=1 and then, oom killer ignore process memory usage
>> and kill random process. This regression is caused by commit
>> a63d83f427 (oom: badness heuristic rewrite).
>>
>> This patch changes select_bad_process() slightly. If oom points == 1,
>> it's a sign that the system have only root privileged processes or
>> similar. Thus, select_bad_process() calculate oom badness without
>> root bonus and select eligible process.
>>
>
> You said earlier that you thought it was a good idea to do a proportional
> based bonus for root processes.  Do you have a specific objection to
> giving root processes a 1% bonus for every 10% of used memory instead?

Because it's completely another topic. You have to maek another patch.



>> Also, this patch move finding sacrifice child logic into
>> select_bad_process(). It's necessary to implement adequate
>> no root bonus recalculation. and it makes good side effect,
>> current logic doesn't behave as the doc.
>>
>
> This is unnecessary and just makes the oom killer egregiously long.  We
> are already diagnosing problems here at Google where the oom killer holds
> tasklist_lock on the readside for far too long, causing other cpus waiting
> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> disabled and it is spinning.  A second tasklist scan is simply a
> non-starter.
>
>   [ This is also one of the reasons why we needed to introduce
>     mm->oom_disable_count to prevent a second, expensive tasklist scan. ]

You misunderstand the code. Both select_bad_process() and oom_kill_process()
are under tasklist_lock(). IOW, no change lock holding time.


>> Documentation/sysctl/vm.txt says
>>
>>      oom_kill_allocating_task
>>
>>      If this is set to non-zero, the OOM killer simply kills the task that
>>      triggered the out-of-memory condition.  This avoids the expensive
>>      tasklist scan.
>>
>> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
>> This patch also fixes this issue.
>>
>
> oom_kill_allocating_task was introduced for SGI to prevent the expensive
> tasklist scan, the task that is actually allocating the memory isn't
> actually interesting and is usually random.  This should be turned into a
> documentation fix rather than changing the implementation.

No benefit. I don't take it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/