lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EEF74AC.1060503@gmail.com>
Date:	Mon, 19 Dec 2011 12:30:20 -0500
From:	KOSAKI Motohiro <kosaki.motohiro@...il.com>
To:	Anton Vorontsov <anton.vorontsov@...aro.org>
CC:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Arve Hjønnevåg 
	<arve@...roid.com>, Rik van Riel <riel@...hat.com>,
	Pavel Machek <pavel@....cz>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Michal Hocko <mhocko@...e.cz>,
	John Stultz <john.stultz@...aro.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: Android low memory killer vs. memory pressure notifications

> Personally I'd start thinking about the new [lightweight] notification
> stuff, i.e. something without mem_cg's downsides. Though, I'm Cc'ing
> Android folks so maybe they could enlighten us why in-kernel "lowmemory
> manager" might be a better idea. Plus Cc'ing folks that I think might
> be interested in this discussion.
>
> Thanks!
>
> p.s.
>
> I'm inlining the android memory killer code down below, just for the
> reference. It is quite small (and useful... though, currently only for
> Android case).
>
> - - - -
> From: Arve Hjønnevåg<arve@...roid.com>
> Subject: Android low memory killer driver
>
> The lowmemorykiller driver lets user-space specify a set of memory thresholds
> where processes with a range of oom_adj values will get killed. Specify the
> minimum oom_adj values in /sys/module/lowmemorykiller/parameters/adj and the
> number of free pages in /sys/module/lowmemorykiller/parameters/minfree. Both
> files take a comma separated list of numbers in ascending order.
>
> For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
> "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill processes
> with a oom_adj value of 8 or higher when the free memory drops below 4096 pages
> and kill processes with a oom_adj value of 0 or higher when the free memory
> drops below 1024 pages.
>
> The driver considers memory used for caches to be free, but if a large
> percentage of the cached memory is locked this can be very inaccurate
> and processes may not get killed until the normal oom killer is triggered.
>
> ---
>   mm/Kconfig           |    7 ++
>   mm/Makefile          |    1 +
>   mm/lowmemorykiller.c |  175 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 183 insertions(+), 0 deletions(-)
>   create mode 100644 mm/lowmemorykiller.c
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 011b110..a2e7959 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -259,6 +259,12 @@ config DEFAULT_MMAP_MIN_ADDR
>   	  This value can be changed after boot using the
>   	  /proc/sys/vm/mmap_min_addr tunable.
>
> +config LOW_MEMORY_KILLER
> +	bool "Low Memory Killer"
> +	help
> +	  The lowmemorykiller driver lets user-space specify a set of memory
> +	  thresholds where processes will get killed.
> +
>   config ARCH_SUPPORTS_MEMORY_FAILURE
>   	bool
>
> diff --git a/mm/Makefile b/mm/Makefile
> index 50ec00e..10fb4ff 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -47,6 +47,7 @@ obj-$(CONFIG_QUICKLIST) += quicklist.o
>   obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
>   obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
>   obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
> +obj-$(CONFIG_LOW_MEMORY_KILLER)	+= lowmemorykiller.o
>   obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
>   obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
>   obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> diff --git a/mm/lowmemorykiller.c b/mm/lowmemorykiller.c
> new file mode 100644
> index 0000000..4e51936
> --- /dev/null
> +++ b/mm/lowmemorykiller.c
> @@ -0,0 +1,175 @@
> +/*
> + * The lowmemorykiller driver lets user-space specify a set of memory thresholds
> + * where processes with a range of oom_adj values will get killed. Specify the
> + * minimum oom_adj values in /sys/module/lowmemorykiller/parameters/adj and the
> + * number of free pages in /sys/module/lowmemorykiller/parameters/minfree. Both
> + * files take a comma separated list of numbers in ascending order.
> + *
> + * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
> + * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill processes
> + * with a oom_adj value of 8 or higher when the free memory drops below 4096 pages
> + * and kill processes with a oom_adj value of 0 or higher when the free memory
> + * drops below 1024 pages.
> + *
> + * The driver considers memory used for caches to be free, but if a large
> + * percentage of the cached memory is locked this can be very inaccurate
> + * and processes may not get killed until the normal oom killer is triggered.
> + *
> + * Copyright (C) 2007-2008 Google, Inc.
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include<linux/module.h>
> +#include<linux/kernel.h>
> +#include<linux/mm.h>
> +#include<linux/oom.h>
> +#include<linux/sched.h>
> +#include<linux/notifier.h>
> +
> +static uint32_t lowmem_debug_level = 2;
> +static int lowmem_adj[6] = {
> +	0,
> +	1,
> +	6,
> +	12,
> +};
> +static int lowmem_adj_size = 4;
> +static size_t lowmem_minfree[6] = {
> +	3 * 512,	/* 6MB */
> +	2 * 1024,	/* 8MB */
> +	4 * 1024,	/* 16MB */
> +	16 * 1024,	/* 64MB */
> +};
> +static int lowmem_minfree_size = 4;
> +
> +#define lowmem_print(level, x...)			\
> +	do {						\
> +		if (lowmem_debug_level>= (level))	\
> +			printk(x);			\
> +	} while (0)
> +
> +static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> +{
> +	struct task_struct *p;
> +	struct task_struct *selected = NULL;
> +	int rem = 0;
> +	int tasksize;
> +	int i;
> +	int min_adj = OOM_ADJUST_MAX + 1;
> +	int selected_tasksize = 0;
> +	int selected_oom_adj;
> +	int array_size = ARRAY_SIZE(lowmem_adj);
> +	int other_free = global_page_state(NR_FREE_PAGES);
> +	int other_file = global_page_state(NR_FILE_PAGES) -
> +						global_page_state(NR_SHMEM);
> +
> +	if (lowmem_adj_size<  array_size)
> +		array_size = lowmem_adj_size;
> +	if (lowmem_minfree_size<  array_size)
> +		array_size = lowmem_minfree_size;
> +	for (i = 0; i<  array_size; i++) {
> +		if (other_free<  lowmem_minfree[i]&&
> +		    other_file<  lowmem_minfree[i]) {
> +			min_adj = lowmem_adj[i];
> +			break;
> +		}
> +	}
> +	if (sc->nr_to_scan>  0)
> +		lowmem_print(3, "lowmem_shrink %lu, %x, ofree %d %d, ma %d\n",
> +			     sc->nr_to_scan, sc->gfp_mask, other_free, other_file,
> +			     min_adj);
> +	rem = global_page_state(NR_ACTIVE_ANON) +
> +		global_page_state(NR_ACTIVE_FILE) +
> +		global_page_state(NR_INACTIVE_ANON) +
> +		global_page_state(NR_INACTIVE_FILE);

Seems incorrect. process killing only free anon pages, but not file cache.


> +	if (sc->nr_to_scan<= 0 || min_adj == OOM_ADJUST_MAX + 1) {
> +		lowmem_print(5, "lowmem_shrink %lu, %x, return %d\n",
> +			     sc->nr_to_scan, sc->gfp_mask, rem);
> +		return rem;
> +	}
> +	selected_oom_adj = min_adj;
> +
> +	read_lock(&tasklist_lock);

Crazy inefficient. mere slab shrinker shouldn't take tasklist_lock. 
Imagine if tasks are much plenty...

Moreover, if system have plenty file cache, any process shouldn't killed 
at all! That's fundamental downside of this patch.


> +	for_each_process(p) {
> +		struct mm_struct *mm;
> +		struct signal_struct *sig;
> +		int oom_adj;
> +
> +		task_lock(p);
> +		mm = p->mm;
> +		sig = p->signal;
> +		if (!mm || !sig) {
> +			task_unlock(p);
> +			continue;
> +		}
> +		oom_adj = sig->oom_adj;
> +		if (oom_adj<  min_adj) {
> +			task_unlock(p);
> +			continue;
> +		}
> +		tasksize = get_mm_rss(mm);
> +		task_unlock(p);
> +		if (tasksize<= 0)
> +			continue;
> +		if (selected) {
> +			if (oom_adj<  selected_oom_adj)
> +				continue;
> +			if (oom_adj == selected_oom_adj&&
> +			    tasksize<= selected_tasksize)
> +				continue;
> +		}
> +		selected = p;
> +		selected_tasksize = tasksize;
> +		selected_oom_adj = oom_adj;
> +		lowmem_print(2, "select %d (%s), adj %d, size %d, to kill\n",
> +			     p->pid, p->comm, oom_adj, tasksize);
> +	}
> +	if (selected) {
> +		lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
> +			     selected->pid, selected->comm,
> +			     selected_oom_adj, selected_tasksize);
> +		force_sig(SIGKILL, selected);

Scary naive assumption. To send SIGKILL doesn't have a guarantee to kill 
a process immediately if the task is stuck in kernel.


> +		rem -= selected_tasksize;
> +	}
> +	lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
> +		     sc->nr_to_scan, sc->gfp_mask, rem);
> +	read_unlock(&tasklist_lock);
> +	return rem;
> +}
> +
> +static struct shrinker lowmem_shrinker = {
> +	.shrink = lowmem_shrink,
> +	.seeks = DEFAULT_SEEKS * 16
> +};
> +
> +static int __init lowmem_init(void)
> +{
> +	register_shrinker(&lowmem_shrinker);
> +	return 0;
> +}
> +
> +static void __exit lowmem_exit(void)
> +{
> +	unregister_shrinker(&lowmem_shrinker);
> +}
> +
> +module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);
> +module_param_array_named(adj, lowmem_adj, int,&lowmem_adj_size,
> +			 S_IRUGO | S_IWUSR);
> +module_param_array_named(minfree, lowmem_minfree, uint,&lowmem_minfree_size,
> +			 S_IRUGO | S_IWUSR);
> +module_param_named(debug_level, lowmem_debug_level, uint, S_IRUGO | S_IWUSR);
> +
> +module_init(lowmem_init);
> +module_exit(lowmem_exit);
> +
> +MODULE_LICENSE("GPL");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ