lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Jul 2015 09:59:04 +0530
From:	Pintu Kumar <pintu.k@...sung.com>
To:	akpm@...ux-foundation.org, corbet@....net, vbabka@...e.cz,
	gorcunov@...nvz.org, pintu.k@...sung.com, mhocko@...e.cz,
	emunson@...mai.com, kirill.shutemov@...ux.intel.com,
	standby24x7@...il.com, hannes@...xchg.org, vdavydov@...allels.com,
	hughd@...gle.com, minchan@...nel.org, tj@...nel.org,
	rientjes@...gle.com, xypron.glpk@....de, dzickus@...hat.com,
	prarit@...hat.com, ebiederm@...ssion.com, rostedt@...dmis.org,
	uobergfe@...hat.com, paulmck@...ux.vnet.ibm.com,
	iamjoonsoo.kim@....com, ddstreet@...e.org, sasha.levin@...cle.com,
	koct9i@...il.com, mgorman@...e.de, cj@...ux.com,
	opensource.ganesh@...il.com, vinmenon@...eaurora.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, linux-pm@...r.kernel.org, qiuxishi@...wei.com,
	Valdis.Kletnieks@...edu
Cc:	cpgs@...sung.com, pintu_agarwal@...oo.com, vishnu.ps@...sung.com,
	rohit.kr@...sung.com, iqbal.ams@...sung.com, pintu.ping@...il.com,
	pintu.k@...look.com
Subject: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

This patch provides 2 things:
1. Add new control called shrink_memory in /proc/sys/vm/.
This control can be used to aggressively reclaim memory system-wide
in one shot from the user space. A value of 1 will instruct the
kernel to reclaim as much as totalram_pages in the system.
Example: echo 1 > /proc/sys/vm/shrink_memory

If any other value than 1 is written to shrink_memory an error EINVAL
occurs.

2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
Currently, shrink_all_memory function is used only during hibernation.
With the new config we can make use of this API for non-hibernation case
also without disturbing the hibernation case.

The detailed paper was presented in Embedded Linux Conference, Mar-2015
http://events.linuxfoundation.org/sites/events/files/slides/
%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf

A sample example is shown below:
Device: ARMv7, Dual Core CPU 1.2GHz
RAM: 512MB (Without SWAP/ZRAM)
Linux Kernel: 3.10.17
Scenario: Just after boot-up finished.

BEFORE:
-------------------------------------------------------------------------
shell> free -tm ; cat /proc/buddyinfo
             total       used       free     shared    buffers     cached
Mem:           460        440         20          0         35        154
-/+ buffers/cache:        250        209
Swap:            0          0          0
Total:         460        440         20
Node 0, zone   Normal   1037    705     92     19     19     17      4      9      0      0      0

shell> vmstat 1 &

AFTER:
-------------------------------------------------------------------------
shell> echo 1 > /proc/sys/vm/shrink_memory

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0  20768  35876 157876    0    0     0     0   64  177  0  1 99  0  0
--------------------------------------------------------------------------------
|1  0      0  33104  34864 149808    0    0     0     0   82  221  0 12 88  0  0|
--------------------------------------------------------------------------------
 0  0      0 188776   3000  54420    0    0     0     0  216  374  0 30 70  0  0
 0  0      0 188400   3652  54528    0    0   740     8  188  337  2  1 95  2  0

shell> free -tm ; cat /proc/buddyinfo
             total       used       free     shared    buffers     cached
Mem:           460        278        182          0          4         54
-/+ buffers/cache:        219        240
Swap:            0          0          0
Total:         460        278        182
Node 0, zone   Normal   5575   3158   1500    727    240     90     33     18     10      6      6

RESULTS:
-----------------------------------------------------
Around 160MB of memory were recovered in one shot.
Many higher-order pages were recovered in the process.
>From the vmstat output the total CPU usage is: ~12% (system), when this
command is running, for 1 second.
We also measured the power consumption using H/W power monitor tool.
Below is the result:
Before - ~180mA
During shrink memory - ~237mA
Duration - ~0.5 sec
Consumption: ~57mA

FURTHER OBSERVATIONS:
-----------------------------------------------------
37% reduction in killing of application with memory shrink calling on boot up.
Around ~4000 page faults are reduced.
Around ~43% of reduction in kswapd calls.
Movement to slowpath reduced dractically.
Combining shrink_memory with compaction shows good benefits over fragmentation.

APPLICATION LAUNCH BEHAVIOR:
-----------------------------------------------------
During First Launch:
============================================================================
Application	Before_shrink_memory	After_shrink_memory	Difference
Camera		1.981			1.86			0.121
Gallery		1.276			0.94			0.336
contacts	1.112			0.941			0.171
messaging	0.886			0.795			0.091
settings	1.257			1.212			0.045
Music		1.854			2.098			-0.244
Gmail		1.872			1.935			-0.063
Browser		2.569			2.677			-0.108
============================================================================

During Re-launch:
============================================================================
Application	Before_shrink_memory	After_shrink_memory	Difference
Camera		1.248			0.976			0.272
Gallery		0.697			0.633			0.064
contacts	0.506			0.561			-0.055
messaging	0.533			0.489			0.044
settings	0.833			0.805			0.028
Music		0.832			0.769			0.063
Gmail		0.913			0.841			0.072
Browser		0.579			0.57			0.009
============================================================================

Various other use cases where this can be used:
----------------------------------------------------------------------------
1) Just after system boot-up is finished, using the sysctl configuration from
   bootup script.
2) During system suspend state, after suspend_freeze_processes()
   [kernel/power/suspend.c]
   Based on certain condition about fragmentation or free memory state.
3) From Android ION system heap driver, when order-4 allocation starts failing.
   By calling shrink_all_memory, in a separate worker thread, based on certain
   condition.
4) It can be combined with compact_memory to achieve better results on memory
   fragmentation.
5) It can be helpful in debugging and tuning various vm parameters.
6) It can be helpful to identify how much of maximum memory could be
   reclaimable at any point of time.
   And how much higher-order pages could be formed with this amount of
   reclaimable memory.
   Thus it can be helpful in accordingly tuning the reserved memory needs
   of a system.
7) It can be helpful in properly tuning the SWAP size in the system.
   In shrink_all_memory, we enable may_swap = 1, that means all unused pages
   will be swapped out.
   Thus, running shrink_memory on a heavy loaded system, we can check how much
   swap is getting full.
   That can be the maximum swap size with a 10% delta.
   Also if ZRAM is used, it helps us in compressing and storing the pages for
   later use.
8) It can be helpful to allow more new applications to be launched, without
   killing the older once.
   And moving the least recently used pages to the SWAP area.
   Thus user data can be retained.
9) Can be part of a system utility to quickly defragment entire system
   memory.
10) This may also help in reducing fragmentation within CMA region.
11) More use cases can be identified.

Most importantly, it can be more effective when applied intelligently, based
on certain conditions.
It should not be executed always and the decision is left upto the user.

Signed-off-by: Pintu Kumar <pintu.k@...sung.com>
---
V3: Correcting a small typo error at the end of commit message.

V2: Added min,max parameter for shrink_memory, suggested by
    Heinrich Schuchardt <xypron.glpk@....de>.
    Error handling in sysctl_shrinkmem_handler, for any value other than 1,
    suggested by, Heinrich Schuchardt <xypron.glpk@....de>.
    Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory,
    suggested by Valdis.Kletnieks@...edu.
    Restore gfp_mask to original, because of other dependencies.
    Also adding GFP_RECLAIM_MASK, does not affect anything.
    Verified power consumption data during shrink_memory,
    as suggested by Johannes Weiner <hannes@...xchg.org>.
    Verified application launch/re-launch scenarios before/after shrink_memory,
    as suggested by Xishi Qiu <qiuxishi@...wei.com>.
    Updates the commit messages with examples and use cases.

 Documentation/sysctl/vm.txt |   18 ++++++++++++++++++
 include/linux/swap.h        |    7 +++++++
 kernel/sysctl.c             |   16 ++++++++++++++++
 mm/Kconfig                  |    8 ++++++++
 mm/vmscan.c                 |   34 ++++++++++++++++++++++++++++++++--
 5 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9832ec5..54eda3a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - panic_on_oom
 - percpu_pagelist_fraction
+- shrink_memory
 - stat_interval
 - swappiness
 - user_reserve_kbytes
@@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior.
 
 ==============================================================
 
+shrink_memory
+
+This control is available only when CONFIG_SHRINK_MEMORY is set. This control
+can be used to aggressively reclaim memory system-wide in one shot. A value of
+1 will instruct the kernel to reclaim as much as totalram_pages in the system.
+For example, to reclaim all memory system-wide we can do:
+# echo 1 > /proc/sys/vm/shrink_memory
+
+If any other value than 1 is written to shrink_memory an error EINVAL occurs.
+
+For more information about this control, please visit the following
+presentation in embedded linux conference, 2015.
+http://events.linuxfoundation.org/sites/events/files/slides/
+%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
+
+==============================================================
+
 stat_interval
 
 The time interval between which vm statistics are updated.  The default
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 9a7adfb..6505b0b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -333,6 +333,13 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern unsigned long vm_total_pages;
 
+#ifdef CONFIG_SHRINK_MEMORY
+extern int sysctl_shrink_memory;
+extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *length, loff_t *ppos);
+#endif
+
+
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
 extern int sysctl_min_unmapped_ratio;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c566b56..e66581b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -275,6 +275,11 @@ static int min_extfrag_threshold;
 static int max_extfrag_threshold = 1000;
 #endif
 
+#ifdef CONFIG_SHRINK_MEMORY
+static int min_shrink_memory = 1;
+static int max_shrink_memory = 1;
+#endif
+
 static struct ctl_table kern_table[] = {
 	{
 		.procname	= "sched_child_runs_first",
@@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = {
 	},
 
 #endif /* CONFIG_COMPACTION */
+#ifdef CONFIG_SHRINK_MEMORY
+	{
+		.procname	= "shrink_memory",
+		.data		= &sysctl_shrink_memory,
+		.maxlen		= sizeof(int),
+		.mode		= 0200,
+		.proc_handler	= sysctl_shrinkmem_handler,
+		.extra1         = &min_shrink_memory,
+		.extra2         = &max_shrink_memory,
+	},
+#endif
 	{
 		.procname	= "min_free_kbytes",
 		.data		= &min_free_kbytes,
diff --git a/mm/Kconfig b/mm/Kconfig
index b3a60ee..8e04bd9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
 	  when kswapd starts. This has a potential performance impact on
 	  processes running early in the lifetime of the systemm until kswapd
 	  finishes the initialisation.
+
+config SHRINK_MEMORY
+	bool "Allow for system-wide shrinking of memory"
+	default n
+	depends on MMU
+	help
+	  It enables support for system-wide memory reclaim in one shot using
+	  echo 1 > /proc/sys/vm/shrink_memory.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c8d8282..e802fa7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,10 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
+#ifdef CONFIG_SHRINK_MEMORY
+#include <linux/suspend.h>
+#endif
+
 struct scan_control {
 	/* How many pages shrink_list() should reclaim */
 	unsigned long nr_to_reclaim;
@@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
 	wake_up_interruptible(&pgdat->kswapd_wait);
 }
 
-#ifdef CONFIG_HIBERNATION
+#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
 /*
  * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
  * freed pages.
@@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 		.may_writepage = 1,
 		.may_unmap = 1,
 		.may_swap = 1,
-		.hibernation_mode = 1,
 	};
 	struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
 	struct task_struct *p = current;
 	unsigned long nr_reclaimed;
 
+	if (system_entering_hibernation())
+		sc.hibernation_mode = 1;
+	else
+		sc.hibernation_mode = 0;
+
 	p->flags |= PF_MEMALLOC;
 	lockdep_set_current_reclaim_state(sc.gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
@@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 }
 #endif /* CONFIG_HIBERNATION */
 
+#ifdef CONFIG_SHRINK_MEMORY
+int sysctl_shrink_memory;
+/* This is the entry point for system-wide shrink memory
++via /proc/sys/vm/shrink_memory */
+int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *length, loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+	if (ret)
+		return ret;
+
+	if (write) {
+		if (sysctl_shrink_memory & 1)
+			shrink_all_memory(totalram_pages);
+	}
+
+	return 0;
+}
+#endif
+
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
    not required for correctness.  So if the last cpu in a node goes
    away, we get changed to run anywhere: as the first one comes back,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ