lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50EDDF1E.6010705@parallels.com>
Date:	Thu, 10 Jan 2013 01:20:30 +0400
From:	Glauber Costa <glommer@...allels.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Anton Vorontsov <anton.vorontsov@...aro.org>,
	David Rientjes <rientjes@...gle.com>,
	Pekka Enberg <penberg@...nel.org>,
	Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...e.cz>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Luiz Capitulino <lcapitulino@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Greg Thelen <gthelen@...gle.com>,
	Leonid Moiseichuk <leonid.moiseichuk@...ia.com>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Minchan Kim <minchan@...nel.org>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	John Stultz <john.stultz@...aro.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <linaro-kernel@...ts.linaro.org>,
	<patches@...aro.org>, <kernel-team@...roid.com>
Subject: Re: [PATCH 1/2] Add mempressure cgroup

On 01/10/2013 12:37 AM, Tejun Heo wrote:
> Hello,
> 
> Can you please cc me too when posting further patches?  I kinda missed
> the whole discussion upto this point.
> 
> On Fri, Jan 04, 2013 at 12:29:11AM -0800, Anton Vorontsov wrote:
>> This commit implements David Rientjes' idea of mempressure cgroup.
>>
>> The main characteristics are the same to what I've tried to add to vmevent
>> API; internally, it uses Mel Gorman's idea of scanned/reclaimed ratio for
>> pressure index calculation. But we don't expose the index to the userland.
>> Instead, there are three levels of the pressure:
>>
>>  o low (just reclaiming, e.g. caches are draining);
>>  o medium (allocation cost becomes high, e.g. swapping);
>>  o oom (about to oom very soon).
>>
>> The rationale behind exposing levels and not the raw pressure index
>> described here: http://lkml.org/lkml/2012/11/16/675
>>
>> For a task it is possible to be in both cpusets, memcg and mempressure
>> cgroups, so by rearranging the tasks it is possible to watch a specific
>> pressure (i.e. caused by cpuset and/or memcg).
> 
> So, cgroup is headed towards single hierarchy.  Dunno how much it
> would affect mempressure but it probably isn't wise to design with
> focus on multiple hierarchies.
> 
> Isn't memory reclaim and oom condition tied to memcgs when memcg is in
> use?  It seems natural to tie mempressure to memcg.  Is there some
> reason this should be a separate cgroup.  I'm kinda worried this is
> headed cpuacct / cpu silliness we have.  Glauber, what's your opinion
> here?
> 

I've already said this in a previous incarnation of this thread. But
I'll summarize my main points:

* I believe this mechanism is superior to memcg notification mechanism.
* I believe memcg notification mechanism is quite coarce - we actually
define the thresholds prior to flushing the stock, which means we can be
wrong by as much as 32 * ncpus.
* Agreeing with you that most of the data will come from memcg, I just
think this should all be part of memcg.
* memcg is indeed expensive even when it is not being used, so global
users would like to avoid it. This is true, but I've already
demonstrated that it is an implementation problem rather than a
conceptual problem, and can be fixed - although I had not yet the time
to go back to it (but now I have a lot less on my shoulders than before)

Given the above, I believe that ideally we should use this pressure
mechanism in memcg replacing the current memcg notification mechanism.
More or less like timer expiration happens: you could still write
numbers for compatibility, but those numbers would be internally mapped
into the levels Anton is proposing, that makes *way* more sense.

If that is not possible, they should coexist as "notification" and a
"pressure" mechanism inside memcg.

The main argument against it centered around cpusets also being able to
participate in the play. I haven't yet understood how would it take
place. In particular, I saw no mention to cpusets in the patches.

I will say again that I fully know memcg is expensive. We all do.
However, it only matters to the global case. For the child cgroup case,
you are *already* paying this anyway. And for the global case, we should
not use the costs of it as an excuse: we should fix it, or otherwise
prove that it is unfixable.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ