linux-kernel - Re: Linux killed Kenny, bastard!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0901130134090.25386@chino.kir.corp.google.com>
Date:	Tue, 13 Jan 2009 01:54:02 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Evgeniy Polyakov <zbr@...emap.net>
cc:	Bill Davidsen <davidsen@....com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Linux killed Kenny, bastard!

On Tue, 13 Jan 2009, Evgeniy Polyakov wrote:

> It is a theory, not a practice. OOM-killer most of time starts from ssh,
> database and lighttpd on the tested machines, when it could start in
> the reverse order and do not touch ssh at all. Better not from daemon
> itself, but its fastcgi spawned processes.
> 

In the unconstrained system-wide oom case, it scans each task on the 
system (which can take very long, ask SGI) and rates its badness scoring.  
When a memory-hogging task is identified, which you have complete control 
over in userspace by tuning /proc/pid/oom_adj, it attempts to kill a child 
first if it will allow for memory freeing without killing the parent.

> I agree, that there are ways to tune the way oom-killer selects the
> victim, and likely after hours of games this subtly will work for the
> specified workload.

It doesn't involve "hours of games," it is a very simple heuristic that 
you can easily tune to specify your preferences.

What you're looking for with your patch is simply a way to specify an oom 
preference before the task has been forked, but that's simple to do with 
the current logic since oom_adj scores are inherited and preference is 
given to killing a child before parent.

> What I propose is the simplest way for the most
> commonly used case.

No, procfs is the correct interface for tuning oom kill preferences and 
not by name parsing.

With oom_adj scores, you have the ability to specify oom kill preferences 
within a cpuset or memory controller as well, whereas oom_victim_name is 
global and very costly when not found in select_bad_process().

> It is a help for the admin and not the force to
> invent complex machinery which will be error-prone and hard to debug
> when eventually oom happens.

It's very simple to debug the oom killer's decisions, which is why I 
introduced /proc/sys/vm/oom_dump_tasks.

It also requires two expensive scans of the entire tasklist (I introduced 
/proc/sys/vm/oom_kill_allocating_task specifically to avoid _one_ 
expensive scan) when oom_victim_name isn't found.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/