lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Jul 2015 21:43:02 +0530
From:	PINTU KUMAR <pintu.k@...sung.com>
To:	'Mel Gorman' <mgorman@...e.de>
Cc:	akpm@...ux-foundation.org, corbet@....net, vbabka@...e.cz,
	gorcunov@...nvz.org, mhocko@...e.cz, emunson@...mai.com,
	kirill.shutemov@...ux.intel.com, standby24x7@...il.com,
	hannes@...xchg.org, vdavydov@...allels.com, hughd@...gle.com,
	minchan@...nel.org, tj@...nel.org, rientjes@...gle.com,
	xypron.glpk@....de, dzickus@...hat.com, prarit@...hat.com,
	ebiederm@...ssion.com, rostedt@...dmis.org, uobergfe@...hat.com,
	paulmck@...ux.vnet.ibm.com, iamjoonsoo.kim@....com,
	ddstreet@...e.org, sasha.levin@...cle.com, koct9i@...il.com,
	cj@...ux.com, opensource.ganesh@...il.com, vinmenon@...eaurora.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, linux-pm@...r.kernel.org, qiuxishi@...wei.com,
	Valdis.Kletnieks@...edu, cpgs@...sung.com, pintu_agarwal@...oo.com,
	vishnu.ps@...sung.com, rohit.kr@...sung.com, iqbal.ams@...sung.com,
	pintu.ping@...il.com, pintu.k@...look.com
Subject: RE: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
 feature

Hi,

Thank you all for reviewing the patch and providing your valuable comments and
suggestions.
During the ELC conference many people suggested to release the patch to
mainline, so this patch, to get others opinion.

If you have any more suggestions to experiment and verify please let me know.

The suggestion was only to open up the shrink_all_memory API for some use cases.

I am not saying that it needs to be called continuously. It can be used only on
certain condition and only when deemed necessary.
The same technique is already used in hibernation to reduce the RAM snapshot
image size.
But in embedded world, hibernation is not used, so this feature cannot be
utilized.

Thanks once again for the review and feedback.


> -----Original Message-----
> From: Mel Gorman [mailto:mgorman@...e.de]
> Sent: Monday, July 20, 2015 1:58 PM
> To: Pintu Kumar
> Cc: akpm@...ux-foundation.org; corbet@....net; vbabka@...e.cz;
> gorcunov@...nvz.org; mhocko@...e.cz; emunson@...mai.com;
> kirill.shutemov@...ux.intel.com; standby24x7@...il.com;
> hannes@...xchg.org; vdavydov@...allels.com; hughd@...gle.com;
> minchan@...nel.org; tj@...nel.org; rientjes@...gle.com;
> xypron.glpk@....de; dzickus@...hat.com; prarit@...hat.com;
> ebiederm@...ssion.com; rostedt@...dmis.org; uobergfe@...hat.com;
> paulmck@...ux.vnet.ibm.com; iamjoonsoo.kim@....com; ddstreet@...e.org;
> sasha.levin@...cle.com; koct9i@...il.com; cj@...ux.com;
> opensource.ganesh@...il.com; vinmenon@...eaurora.org; linux-
> doc@...r.kernel.org; linux-kernel@...r.kernel.org; linux-mm@...ck.org; linux-
> pm@...r.kernel.org; qiuxishi@...wei.com; Valdis.Kletnieks@...edu;
> cpgs@...sung.com; pintu_agarwal@...oo.com; vishnu.ps@...sung.com;
> rohit.kr@...sung.com; iqbal.ams@...sung.com; pintu.ping@...il.com;
> pintu.k@...look.com
> Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
> 
> On Mon, Jul 20, 2015 at 09:59:04AM +0530, Pintu Kumar wrote:
> > This patch provides 2 things:
> > 1. Add new control called shrink_memory in /proc/sys/vm/.
> > This control can be used to aggressively reclaim memory system-wide in
> > one shot from the user space. A value of 1 will instruct the kernel to
> > reclaim as much as totalram_pages in the system.
> > Example: echo 1 > /proc/sys/vm/shrink_memory
> >
> > If any other value than 1 is written to shrink_memory an error EINVAL
> > occurs.
> >
> > 2. Enable shrink_all_memory API in kernel with new
> CONFIG_SHRINK_MEMORY.
> > Currently, shrink_all_memory function is used only during hibernation.
> > With the new config we can make use of this API for non-hibernation
> > case also without disturbing the hibernation case.
> >
> > The detailed paper was presented in Embedded Linux Conference,
> > Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/
> > %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> >
> 
> Johannes has already reviewed this series and explained why it's a bad idea.
This
> is just a note to say that I agree the points he made and also think that
adding an
> additional knob to reclaim data from user space is a bad idea. Even
drop_caches
> is only intended as a debugging tool to illustrate cases where normal reclaim
is
> broken. Similarly compact_node exists as a debugging tool to check if direct
> compaction is not behaving as expected.
> 
> If this is invoked when high-order allocations start failing and memory is
> fragmented with unreclaimable memory then it'll potentially keep thrashing
> depending on the userspace monitor implementation.  If the latency of a high
> order allocation is important then reclaim/compaction should be examined and
> improved. If the reliability of high-order allocations are important then you
either
> need to reserve the memory in advance. If that is undesirable due to a
> constrained memory environment then one approach is to modify how pages are
> grouped by mobility as described in the leader of the series "Remove zonelist
> cache and high-order watermark checking".
> There are two suggestions there for out-of-tree patches that would make high-
> order allocations more reliable that are not suitable for mainline.
> 
> Yes, I read your presentation but lets go through the use cases you list
again;
> 
> > Various other use cases where this can be used:
> > ----------------------------------------------------------------------
> > ------
> > 1) Just after system boot-up is finished, using the sysctl configuration
from
> >    bootup script.
> 
> Almost no benefit. Any page cache that is active and now cold would be
trivially
> reclaimed later.
> 
> > 2) During system suspend state, after suspend_freeze_processes()
> >    [kernel/power/suspend.c]
> >    Based on certain condition about fragmentation or free memory state.
> 
> No gain.
> 
> > 3) From Android ION system heap driver, when order-4 allocation starts
failing.
> >    By calling shrink_all_memory, in a separate worker thread, based on
certain
> >    condition.
> 
> If order-4 allocations fail when shrink_all_memory works and the order-4
> allocation is required to work then the aggressiveness of reclaim/compaction
> needs to be fixed to reclaim all system memory if necessary. Right now it can
bail
> because generally it is expected that no subsystem depends on high order
> allocations succeeding for functional correctness.
> 
> > 4) It can be combined with compact_memory to achieve better results on
> memory
> >    fragmentation.
> 
> Only by reclaiming the world. In 3.0 the system behaved like this. High order
> stress tests could take hours to complete as the system was continually
thrashed.
> Today the same test would complete in about 15 minutes albeit with lower
> allocation success rates. We ran into multiple issues where high order
allocation
> requests caused the system to thrash and triggering such thrashing from
> userspace is not an improvement.
> 
> > 5) It can be helpful in debugging and tuning various vm parameters.
> 
> No more than drop_caches is.
> 
> > 6) It can be helpful to identify how much of maximum memory could be
> >    reclaimable at any point of time.
> 
> Only by reclaiming the world. A less destructive means is using MemAvailable
> from /proc/meminfo
> 
> >    And how much higher-order pages could be formed with this amount of
> >    reclaimable memory.
> 
> Only by reclaiming the world
> 
> >    Thus it can be helpful in accordingly tuning the reserved memory needs
> >    of a system.
> 
> By which time it's too late as a reboot will be necessary to set the reserve.
> 
> > 7) It can be helpful in properly tuning the SWAP size in the system.
> 
> Only for a single point in time as it's workload dependant. The same data can
be
> inferred from smaps.
> 
> >    In shrink_all_memory, we enable may_swap = 1, that means all unused pages
> >    will be swapped out.
> >    Thus, running shrink_memory on a heavy loaded system, we can check how
> much
> >    swap is getting full.
> >    That can be the maximum swap size with a 10% delta.
> >    Also if ZRAM is used, it helps us in compressing and storing the pages
for
> >    later use.
> > 8) It can be helpful to allow more new applications to be launched, without
> >    killing the older once.
> 
> Reclaim would achieve the same effect over time.
> 
> >    And moving the least recently used pages to the SWAP area.
> >    Thus user data can be retained.
> > 9) Can be part of a system utility to quickly defragment entire system
> >    memory.
> 
> Any memory that is not on the LRU or indirectly pinned by pages on the LRU are
> unaffected.
> 
> If high-order allocation latency or reliability is important then you really
need a
> different solution because unless this thing runs continually to keep memory
> unused then it'll eventually fail hard and the system will perform poorly in
the
> meantime.
> 
> --
> Mel Gorman
> SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ