[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <06c401d0c05e$663f2ee0$32bd8ca0$@samsung.com>
Date: Fri, 17 Jul 2015 12:28:47 +0530
From: PINTU KUMAR <pintu.k@...sung.com>
To: akpm@...ux-foundation.org, corbet@....net, vbabka@...e.cz,
gorcunov@...nvz.org, mhocko@...e.cz, emunson@...mai.com,
kirill.shutemov@...ux.intel.com, standby24x7@...il.com,
hannes@...xchg.org, vdavydov@...allels.com, hughd@...gle.com,
minchan@...nel.org, tj@...nel.org, rientjes@...gle.com,
xypron.glpk@....de, dzickus@...hat.com, prarit@...hat.com,
ebiederm@...ssion.com, rostedt@...dmis.org, uobergfe@...hat.com,
paulmck@...ux.vnet.ibm.com, iamjoonsoo.kim@....com,
ddstreet@...e.org, sasha.levin@...cle.com, koct9i@...il.com,
mgorman@...e.de, cj@...ux.com, opensource.ganesh@...il.com,
vinmenon@...eaurora.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-pm@...r.kernel.org, qiuxishi@...wei.com,
Valdis.Kletnieks@...edu
Cc: cpgs@...sung.com, pintu_agarwal@...oo.com, vishnu.ps@...sung.com,
rohit.kr@...sung.com, iqbal.ams@...sung.com, pintu.ping@...il.com,
pintu.k@...look.com
Subject: RE: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
feature
Sorry, correcting a small typo error below.
Please review and provide your comments.
This is the version2 of the previous patch.
> -----Original Message-----
> From: Pintu Kumar [mailto:pintu.k@...sung.com]
> Sent: Friday, July 17, 2015 12:00 PM
> To: akpm@...ux-foundation.org; corbet@....net; vbabka@...e.cz;
> gorcunov@...nvz.org; pintu.k@...sung.com; mhocko@...e.cz;
> emunson@...mai.com; kirill.shutemov@...ux.intel.com;
> standby24x7@...il.com; hannes@...xchg.org; vdavydov@...allels.com;
> hughd@...gle.com; minchan@...nel.org; tj@...nel.org; rientjes@...gle.com;
> xypron.glpk@....de; dzickus@...hat.com; prarit@...hat.com;
> ebiederm@...ssion.com; rostedt@...dmis.org; uobergfe@...hat.com;
> paulmck@...ux.vnet.ibm.com; iamjoonsoo.kim@....com; ddstreet@...e.org;
> sasha.levin@...cle.com; koct9i@...il.com; mgorman@...e.de; cj@...ux.com;
> opensource.ganesh@...il.com; vinmenon@...eaurora.org; linux-
> doc@...r.kernel.org; linux-kernel@...r.kernel.org; linux-mm@...ck.org; linux-
> pm@...r.kernel.org; qiuxishi@...wei.com; Valdis.Kletnieks@...edu
> Cc: cpgs@...sung.com; pintu_agarwal@...oo.com; vishnu.ps@...sung.com;
> rohit.kr@...sung.com; iqbal.ams@...sung.com; pintu.ping@...il.com;
> pintu.k@...look.com
> Subject: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide in one
shot
> from the user space. A value of 1 will instruct the kernel to reclaim as much
as
> totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
>
> If any other value than 1 is written to shrink_memory an error EINVAL occurs.
>
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case also
> without disturbing the hibernation case.
>
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>
> A sample example is shown below:
> Device: ARMv7, Dual Core CPU 1.2GHz
> RAM: 512MB (Without SWAP/ZRAM)
> Linux Kernel: 3.10.17
> Scenario: Just after boot-up finished.
>
> BEFORE:
> -------------------------------------------------------------------------
> shell> free -tm ; cat /proc/buddyinfo
> total used free shared buffers cached
> Mem: 460 440 20 0 35 154
> -/+ buffers/cache: 250 209
> Swap: 0 0 0
> Total: 460 440 20
> Node 0, zone Normal 1037 705 92 19 19 17 4 9
0 0 0
>
> shell> vmstat 1 &
>
> AFTER:
> -------------------------------------------------------------------------
> shell> echo 1 > /proc/sys/vm/shrink_memory
>
> r b swpd free buff cache si so bi bo in cs us sy id wa
st
> 0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0
0
>
--------------------------------------------------------------------------------
> |1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0
0|
>
--------------------------------------------------------------------------------
> 0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0
0
> 0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2
0
>
> shell> free -tm ; cat /proc/buddyinfo
> total used free shared buffers cached
> Mem: 460 278 182 0 4 54
> -/+ buffers/cache: 219 240
> Swap: 0 0 0
> Total: 460 278 182
> Node 0, zone Normal 5575 3158 1500 727 240 90 33 18
10 6
> 6
>
> RESULTS:
> -----------------------------------------------------
> Around 160MB of memory were recovered in one shot.
> Many higher-order pages were recovered in the process.
> From the vmstat output the total CPU usage is: ~12% (system), when this
> command is running, for 1 second.
> We also measured the power consumption using H/W power monitor tool.
> Below is the result:
> Before - ~180mA
> During shrink memory - ~237mA
> Duration - ~0.5 sec
> Consumption: ~57mA
>
> FURTHER OBSERVATIONS:
> -----------------------------------------------------
> 37% reduction in killing of application with memory shrink calling on boot up.
> Around ~4000 page faults are reduced.
> Around ~43% of reduction in kswapd calls.
> Movement to slowpath reduced dractically.
> Combining shrink_memory with compaction shows good benefits over
> fragmentation.
>
> APPLICATION LAUNCH BEHAVIOR:
> -----------------------------------------------------
> During First Launch:
> ==================================================================
> ==========
> Application Before_shrink_memory After_shrink_memory Difference
> Camera 1.981 1.86 0.121
> Gallery 1.276 0.94 0.336
> contacts 1.112 0.941 0.171
> messaging 0.886 0.795 0.091
> settings 1.257 1.212 0.045
> Music 1.854 2.098 -0.244
> Gmail 1.872 1.935 -0.063
> Browser 2.569 2.677 -0.108
> ==================================================================
> ==========
>
> During Re-launch:
> ==================================================================
> ==========
> Application Before_shrink_memory After_shrink_memory Difference
> Camera 1.248 0.976 0.272
> Gallery 0.697 0.633 0.064
> contacts 0.506 0.561 -0.055
> messaging 0.533 0.489 0.044
> settings 0.833 0.805 0.028
> Music 0.832 0.769 0.063
> Gmail 0.913 0.841 0.072
> Browser 0.579 0.57 0.009
> ==================================================================
> ==========
>
> Various other use cases where this can be used:
> ----------------------------------------------------------------------------
> 1) Just after system boot-up is finished, using the sysctl configuration from
> bootup script.
> 2) During system suspend state, after suspend_freeze_processes()
> [kernel/power/suspend.c]
> Based on certain condition about fragmentation or free memory state.
> 3) From Android ION system heap driver, when order-4 allocation starts
failing.
> By calling shrink_all_memory, in a separate worker thread, based on certain
> condition.
> 4) It can be combined with compact_memory to achieve better results on
> memory
> fragmentation.
> 5) It can be helpful in debugging and tuning various vm parameters.
> 6) It can be helpful to identify how much of maximum memory could be
> reclaimable at any point of time.
> And how much higher-order pages could be formed with this amount of
> reclaimable memory.
> Thus it can be helpful in accordingly tuning the reserved memory needs
> of a system.
> 7) It can be helpful in properly tuning the SWAP size in the system.
> In shrink_all_memory, we enable may_swap = 1, that means all unused pages
> will be swapped out.
> Thus, running shrink_memory on a heavy loaded system, we can check how
> much
> swap is getting full.
> That can be the maximum swap size with a 10% delta.
> Also if ZRAM is used, it helps us in compressing and storing the pages for
> later use.
> 8) It can be helpful to allow more new applications to be launched, without
> killing the older once.
> And moving the least recently used pages to the SWAP area.
> Thus user data can be retained.
> 9) Can be part of a system system-tool to quickly defragment entire system
> memory.
> 10) This may also help in reducing fragmentation within CMA region.
> 11) More use cases can be identified.
>
> Most importantly, it can be more effective when applied intelligently, based
on
> certain conditions.
> It should be executed always and the decision is left upto the user.
* It should _not_ be executed always. The decision is left to the user.
>
> Signed-off-by: Pintu Kumar <pintu.k@...sung.com>
> ---
> V2: Added min,max parameter for shrink_memory, suggested by
> Heinrich Schuchardt <xypron.glpk@....de>.
> Error handling in sysctl_shrinkmem_handler, for any value other than 1,
> suggested by, Heinrich Schuchardt <xypron.glpk@....de>.
> Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory,
> suggested by Valdis.Kletnieks@...edu.
> Restore gfp_mask to original, because of other dependencies.
> Also adding GFP_RECLAIM_MASK, does not affect anything.
> Verified power consumption data during shrink_memory,
> as suggested by Johannes Weiner <hannes@...xchg.org>.
> Verified application launch/re-launch scenarios before/after
shrink_memory,
> as suggested by Xishi Qiu <qiuxishi@...wei.com>.
> Updates the commit messages with examples and use cases.
>
> Documentation/sysctl/vm.txt | 18 ++++++++++++++++++
> include/linux/swap.h | 7 +++++++
> kernel/sysctl.c | 16 ++++++++++++++++
> mm/Kconfig | 8 ++++++++
> mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++--
> 5 files changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index
> 9832ec5..54eda3a 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
> - page-cluster
> - panic_on_oom
> - percpu_pagelist_fraction
> +- shrink_memory
> - stat_interval
> - swappiness
> - user_reserve_kbytes
> @@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior.
>
> ==============================================================
>
> +shrink_memory
> +
> +This control is available only when CONFIG_SHRINK_MEMORY is set. This
> +control can be used to aggressively reclaim memory system-wide in one
> +shot. A value of
> +1 will instruct the kernel to reclaim as much as totalram_pages in the
system.
> +For example, to reclaim all memory system-wide we can do:
> +# echo 1 > /proc/sys/vm/shrink_memory
> +
> +If any other value than 1 is written to shrink_memory an error EINVAL occurs.
> +
> +For more information about this control, please visit the following
> +presentation in embedded linux conference, 2015.
> +http://events.linuxfoundation.org/sites/events/files/slides/
> +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> +
> +==============================================================
> +
> stat_interval
>
> The time interval between which vm statistics are updated. The default diff
--git
> a/include/linux/swap.h b/include/linux/swap.h index 9a7adfb..6505b0b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -333,6 +333,13 @@ extern int vm_swappiness; extern int
> remove_mapping(struct address_space *mapping, struct page *page); extern
> unsigned long vm_total_pages;
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +extern int sysctl_shrink_memory;
> +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos); #endif
> +
> +
> #ifdef CONFIG_NUMA
> extern int zone_reclaim_mode;
> extern int sysctl_min_unmapped_ratio;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c566b56..e66581b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -275,6 +275,11 @@ static int min_extfrag_threshold; static int
> max_extfrag_threshold = 1000; #endif
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +static int min_shrink_memory = 1;
> +static int max_shrink_memory = 1;
> +#endif
> +
> static struct ctl_table kern_table[] = {
> {
> .procname = "sched_child_runs_first",
> @@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = {
> },
>
> #endif /* CONFIG_COMPACTION */
> +#ifdef CONFIG_SHRINK_MEMORY
> + {
> + .procname = "shrink_memory",
> + .data = &sysctl_shrink_memory,
> + .maxlen = sizeof(int),
> + .mode = 0200,
> + .proc_handler = sysctl_shrinkmem_handler,
> + .extra1 = &min_shrink_memory,
> + .extra2 = &max_shrink_memory,
> + },
> +#endif
> {
> .procname = "min_free_kbytes",
> .data = &min_free_kbytes,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b3a60ee..8e04bd9 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
> when kswapd starts. This has a potential performance impact on
> processes running early in the lifetime of the systemm until kswapd
> finishes the initialisation.
> +
> +config SHRINK_MEMORY
> + bool "Allow for system-wide shrinking of memory"
> + default n
> + depends on MMU
> + help
> + It enables support for system-wide memory reclaim in one shot using
> + echo 1 > /proc/sys/vm/shrink_memory.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c8d8282..e802fa7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -58,6 +58,10 @@
> #define CREATE_TRACE_POINTS
> #include <trace/events/vmscan.h>
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +#include <linux/suspend.h>
> +#endif
> +
> struct scan_control {
> /* How many pages shrink_list() should reclaim */
> unsigned long nr_to_reclaim;
> @@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order,
> enum zone_type classzone_idx)
> wake_up_interruptible(&pgdat->kswapd_wait);
> }
>
> -#ifdef CONFIG_HIBERNATION
> +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
> /*
> * Try to free `nr_to_reclaim' of memory, system-wide, and return the number
of
> * freed pages.
> @@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim)
> .may_writepage = 1,
> .may_unmap = 1,
> .may_swap = 1,
> - .hibernation_mode = 1,
> };
> struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
> struct task_struct *p = current;
> unsigned long nr_reclaimed;
>
> + if (system_entering_hibernation())
> + sc.hibernation_mode = 1;
> + else
> + sc.hibernation_mode = 0;
> +
> p->flags |= PF_MEMALLOC;
> lockdep_set_current_reclaim_state(sc.gfp_mask);
> reclaim_state.reclaimed_slab = 0;
> @@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim) } #endif /* CONFIG_HIBERNATION */
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +int sysctl_shrink_memory;
> +/* This is the entry point for system-wide shrink memory
> ++via /proc/sys/vm/shrink_memory */
> +int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos) {
> + int ret;
> +
> + ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
> + if (ret)
> + return ret;
> +
> + if (write) {
> + if (sysctl_shrink_memory & 1)
> + shrink_all_memory(totalram_pages);
> + }
> +
> + return 0;
> +}
> +#endif
> +
> /* It's optimal to keep kswapds on the same CPUs as their memory, but
> not required for correctness. So if the last cpu in a node goes
> away, we get changed to run anywhere: as the first one comes back,
> --
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists