Subject: [PATCH] mm: make working set portion that is protected tunable From: Christian Ehrhardt In discussion with Rik van Riel and Joannes Weiner we came up that there are cases that want the current "save 50%" for the working set all the time and others that would benefit from protectig only a smaller amount. Eventually no "carved in stone" in kernel ratio will match all use cases, therefore this patch makes the value tunable via a /proc/sys/vm/ interface named active_inactive_ratio. Example configurations might be: - 50% - like the current kernel - 0% - like a kernel pre "56e49d21 vmscan: evict use-once pages first" - x% - any other percentage to allow customizing the system to its needs. Due to our experiments the suggested default in this patch is 25%, but if preferred I'm fine keeping 50% and letting admins/distros adapt as needed. Signed-off-by: Christian Ehrhardt --- [diffstat] [diff] Index: linux-2.6/Documentation/sysctl/vm.txt =================================================================== --- linux-2.6.orig/Documentation/sysctl/vm.txt 2010-04-21 06:32:23.000000000 +0200 +++ linux-2.6/Documentation/sysctl/vm.txt 2010-04-21 07:24:35.000000000 +0200 @@ -18,6 +18,7 @@ Currently, these files are in /proc/sys/vm: +- active_inactive_ratio - block_dump - dirty_background_bytes - dirty_background_ratio @@ -57,6 +58,15 @@ ============================================================== +active_inactive_ratio + +The kernel tries to protect the active working set. Therefore a portion of the +file pages is protected, meaning they are omitted when eviting pages until this +ratio is reached. +This tunable represents that ratio in percent and specifies the protected part + +============================================================== + block_dump block_dump enables block I/O debugging when set to a nonzero value. More Index: linux-2.6/kernel/sysctl.c =================================================================== --- linux-2.6.orig/kernel/sysctl.c 2010-04-21 06:33:43.000000000 +0200 +++ linux-2.6/kernel/sysctl.c 2010-04-21 07:26:35.000000000 +0200 @@ -1271,6 +1271,15 @@ .extra2 = &one, }, #endif + { + .procname = "active_inactive_ratio", + .data = &sysctl_active_inactive_ratio, + .maxlen = sizeof(sysctl_active_inactive_ratio), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one_hundred, + }, /* * NOTE: do not add new entries to this table unless you have read Index: linux-2.6/mm/memcontrol.c =================================================================== --- linux-2.6.orig/mm/memcontrol.c 2010-04-21 06:31:29.000000000 +0200 +++ linux-2.6/mm/memcontrol.c 2010-04-21 09:00:22.000000000 +0200 @@ -893,12 +893,12 @@ int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg) { unsigned long active; - unsigned long inactive; + unsigned long file; - inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE); active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE); + file = active + mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE); - return (active > inactive); + return (active > file * sysctl_active_inactive_ratio / 100); } unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg, Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c 2010-04-21 06:31:29.000000000 +0200 +++ linux-2.6/mm/vmscan.c 2010-04-21 09:00:13.000000000 +0200 @@ -1459,14 +1459,23 @@ return low; } +/* + * sysctl_active_inactive_ratio + * + * Defines the portion of file pages within the active working set is going to + * be protected. The value represents the percentage that will be protected. + */ +int sysctl_active_inactive_ratio __read_mostly = 25; + static int inactive_file_is_low_global(struct zone *zone) { - unsigned long active, inactive; + unsigned long active, file; active = zone_page_state(zone, NR_ACTIVE_FILE); - inactive = zone_page_state(zone, NR_INACTIVE_FILE); + file = active + zone_page_state(zone, NR_INACTIVE_FILE); + + return (active > file * sysctl_active_inactive_ratio / 100); - return (active > inactive); } /** Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2010-04-21 09:02:37.000000000 +0200 +++ linux-2.6/include/linux/mm.h 2010-04-21 09:02:51.000000000 +0200 @@ -1467,5 +1467,7 @@ extern void dump_page(struct page *page); +extern int sysctl_active_inactive_ratio; + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */