linux-kernel - Kernel scanning/freeing to relieve cgroup memory pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <533C0BB4.4070009@gopivotal.com>
Date:	Wed, 02 Apr 2014 14:08:04 +0100
From:	Glyn Normington <gnormington@...ivotal.com>
To:	Tejun Heo <tj@...nel.org>
CC:	linux-kernel@...r.kernel.org
Subject: Kernel scanning/freeing to relieve cgroup memory pressure

Hi Tejun

I'd like yourself and other cgroups developers to be aware of the use case
below.

Regards,
Glyn

Currently, a memory cgroup can hit its oom limit when pages could, in
principle, be reclaimed by the kernel except that the kernel does not
respond directly to cgroup-local memory pressure.

A use case where this is important is running a moderately large Java
application in a memory cgroup in a PaaS environment where cost to the
user depends on the memory limit ([1]). Users need to tune the memory
limit to reduce their costs. During application initialisation large
numbers of JAR files are opened (read-only) and read while loading the
application code and its dependencies. This is reflected in a peak of
file cache usage which can push the memory cgroup memory usage
significantly higher than the value actually needed to run the application.

Possible approaches include (1) automatic response to cgroup-local
memory pressure in the kernel, and (2) a kernel API for reclaiming
memory from a cgroup which could be driven under oom notification (with
the oom killer disabled for the cgroup - it would be enabled if the
cgroup was still oom after calling the kernel to reclaim memory).

Clearly (1) is the preferred approach. The closest facility in the
kernel to (2) is to ask the kernel to free pagecache using `echo 1 >
/proc/sys/vms/drop_caches`, but that is too wide-ranging, especially in
a PaaS environment hosting multiple applications. A similar facility
could be provided for a cgroup via a cgroup pseudo-file
`memory.drop_caches`.

Other approaches include a mempressure cgroup ([2]) which would not be
suitable for PaaS applications. See [3] for Andrew Morton's response. A
related workaround ([4]) was included in the 3.6 kernel.

Related discussions:
[1] 
https://groups.google.com/a/cloudfoundry.org/d/topic/vcap-dev/6M8BDV_tq7w/discussion
[2]https://lwn.net/Articles/531077/ <https://lwn.net/Articles/531077/>
[3]https://lwn.net/Articles/531138/ <https://lwn.net/Articles/531138/>
[4]https://lkml.org/lkml/2013/6/6/462 <https://lkml.org/lkml/2013/6/6/462>&
https://github.com/torvalds/linux/commit/e62e384e 
<https://github.com/torvalds/linux/commit/e62e384e>.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/