lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150130062737.GB25699@htj.dyndns.org>
Date:	Fri, 30 Jan 2015 01:27:37 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Greg Thelen <gthelen@...gle.com>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>, cgroups@...r.kernel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Jan Kara <jack@...e.cz>, Dave Chinner <david@...morbit.com>,
	Jens Axboe <axboe@...nel.dk>,
	Christoph Hellwig <hch@...radead.org>,
	Li Zefan <lizefan@...wei.com>, hughd@...gle.com,
	Konstantin Khebnikov <khlebnikov@...dex-team.ru>
Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma

Hello, Greg.

On Thu, Jan 29, 2015 at 09:55:53PM -0800, Greg Thelen wrote:
> I find simplification appealing.  But I not sure it will fly, if for no
> other reason than the shared accountings.  I'm ignoring intentional
> sharing, used by carefully crafted apps, and just thinking about
> incidental sharing (e.g. libc).
> 
> Example:
> 
> $ mkdir small
> $ echo 1M > small/memory.limit_in_bytes
> $ (echo $BASHPID > small/cgroup.procs && exec sleep 1h) &
> 
> $ mkdir big
> $ echo 10G > big/memory.limit_in_bytes
> $ (echo $BASHPID > big/cgroup.procs && exec mlockall_database 1h) &
> 
> Assuming big/mlockall_database mlocks all of libc, then it will oom kill
> the small memcg because libc is owned by small due it having touched it
> first.  It'd be hard to figure out what small did wrong to deserve the
> oom kill.

The previous behavior was pretty unpredictable in terms of shared file
ownership too.  I wonder whether the better thing to do here is either
charging cases like this to the common ancestor or splitting the
charge equally among the accessors, which might be doable for ro
files.

> FWIW we've been using memcg writeback where inodes have a memcg
> writeback owner.  Once multiple memcg write to an inode then the inode
> becomes writeback shared which makes it more likely to be written.  Once
> cleaned the inode is then again able to be privately owned:
> https://lkml.org/lkml/2011/8/17/200

The problem is that it introduces deviations between memcg and
writeback / blkcg which will mess up pressure propagation.  Writeback
pressure can't be determined without its associated memcg and neither
can dirty balancing.  We sure can simplify things by trading off
accuracies at places but let's please try to do that throughout the
stack, not in the midpoint, so that we can say "if you do this, it'll
behave this way and you can see it showing up there".  The thing is if
we leave it half-way, in time, some will try to actively exploit
memcg's page granularity and we'll have to deal with writeback
behavior which is difficult to even characterize.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ