lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Sep 2014 10:34:55 +0900
From:	Kamezawa Hiroyuki <>
To:	Johannes Weiner <>,
	Vladimir Davydov <>
CC:	Michal Hocko <>, Greg Thelen <>,
	Hugh Dickins <>,
	Motohiro Kosaki <>,
	Glauber Costa <>, Tejun Heo <>,
	Andrew Morton <>,
	Pavel Emelianov <>,
	Konstantin Khorenko <>,
	LKML-MM <>,
	LKML-cgroups <>,
	LKML <>
Subject: Re: [RFC] memory cgroup: my thoughts on memsw

(2014/09/16 4:14), Johannes Weiner wrote:
> Hi Vladimir,
> On Thu, Sep 04, 2014 at 06:30:55PM +0400, Vladimir Davydov wrote:
>> To sum it up, the current mem + memsw configuration scheme doesn't allow
>> us to limit swap usage if we want to partition the system dynamically
>> using soft limits. Actually, it also looks rather confusing to me. We
>> have mem limit and mem+swap limit. I bet that from the first glance, an
>> average admin will think it's possible to limit swap usage by setting
>> the limits so that the difference between memory.memsw.limit and
>> memory.limit equals the maximal swap usage, but (surprise!) it isn't
>> really so. It holds if there's no global memory pressure, but otherwise
>> swap usage is only limited by memory.memsw.limit! IMHO, it isn't
>> something obvious.
> Agreed, memory+swap accounting & limiting is broken.
>>   - Anon memory is handled by the user application, while file caches are
>>     all on the kernel. That means the application will *definitely* die
>>     w/o anon memory. W/o file caches it usually can survive, but the more
>>     caches it has the better it feels.
>>   - Anon memory is not that easy to reclaim. Swap out is a really slow
>>     process, because data are usually read/written w/o any specific
>>     order. Dropping file caches is much easier. Typically we have lots of
>>     clean pages there.
>>   - Swap space is limited. And today, it's OK to have TBs of RAM and only
>>     several GBs of swap. Customers simply don't want to waste their disk
>>     space on that.
>> Finally, my understanding (may be crazy!) how the things should be
>> configured. Just like now, there should be mem_cgroup->res accounting
>> and limiting total user memory (cache+anon) usage for processes inside
>> cgroups. This is where there's nothing to do. However, mem_cgroup->memsw
>> should be reworked to account *only* memory that may be swapped out plus
>> memory that has been swapped out (i.e. swap usage).
> But anon pages are not a resource, they are a swap space liability.
> Think of virtual memory vs. physical pages - the use of one does not
> necessarily result in the use of the other.  Without memory pressure,
> anonymous pages do not consume swap space.
> What we *should* be accounting and limiting here is the actual finite
> resource: swap space.  Whenever we try to swap a page, its owner
> should be charged for the swap space - or the swapout be rejected.
> For hard limit reclaim, the semantics of a swap space limit would be
> fairly obvious, because it's clear who the offender is.
> However, in an overcommitted machine, the amount of swap space used by
> a particular group depends just as much on the behavior of the other
> groups in the system, so the per-group swap limit should be enforced
> even during global reclaim to feed back pressure on whoever is causing
> the swapout.  If reclaim fails, the global OOM killer triggers, which
> should then off the group with the biggest soft limit excess.
> As far as implementation goes, it should be doable to try-charge from
> add_to_swap() and keep the uncharging in swap_entry_free().
> We'll also have to extend the global OOM killer to be memcg-aware, but
> we've been meaning to do that anyway.

When we introduced memsw limitation, we tried to avoid affecting global memory reclaim.
Then, we did memory+swap limitation.

Now, global memory reclaim is memcg-aware. So, I think swap-limitation rather than
anon+swap may be a choice. The change will reduce res_counter access. Hmm, it will be
desireble to move anon pages to Unevictable if memcg's swap slot is 0.

Anyway, I think softlimit should be re-implemented, 1st. It will be starting point.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists