lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150205131514.GD25736@htj.dyndns.org>
Date:	Thu, 5 Feb 2015 08:15:14 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Greg Thelen <gthelen@...gle.com>
Cc:	Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.cz>,
	Cgroups <cgroups@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Jan Kara <jack@...e.cz>, Dave Chinner <david@...morbit.com>,
	Jens Axboe <axboe@...nel.dk>,
	Christoph Hellwig <hch@...radead.org>,
	Li Zefan <lizefan@...wei.com>, Hugh Dickins <hughd@...gle.com>
Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma

Hello, Greg.

On Wed, Feb 04, 2015 at 03:51:01PM -0800, Greg Thelen wrote:
> I think the linux-next low (and the TBD min) limits also have the
> problem for more than just the root memcg.  I'm thinking of a 2M file
> shared between C and D below.  The file will be charged to common parent
> B.
> 
> 	A
> 	+-B    (usage=2M lim=3M min=2M)
> 	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
> 	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
> 	  \-E  (usage=0  lim=2M min=0)
> 
> The problem arises if A/B/E allocates more than 1M of private
> reclaimable file data.  This pushes A/B into reclaim which will reclaim
> both the shared file from A/B and private file from A/B/E.  In contrast,
> the current per-page memcg would've protected the shared file in either
> C or D leaving A/B reclaim to only attack A/B/E.
> 
> Pinning the shared file to either C or D, using TBD policy such as mount
> option, would solve this for tightly shared files.  But for wide fanout
> file (libc) the admin would need to assign a global bucket and this
> would be a pain to size due to various job requirements.

Shouldn't we be able to handle it the same way as I proposed for
handling sharing?  The above would look like

 	A
 	+-B    (usage=2M lim=3M min=2M hosted_usage=2M)
 	  +-C  (usage=0  lim=2M min=1M shared_usage=2M)
 	  +-D  (usage=0  lim=2M min=1M shared_usage=2M)
 	  \-E  (usage=0  lim=2M min=0)

Now, we don't wanna use B's min verbatim on the hosted inodes shared
by children but we're unconditionally charging the shared amount to
all sharing children, which means that we're eating into the min
settings of all participating children, so, we should be able to use
sum of all sharing children's min-covered amount as the inode's min,
which of course is to be contained inside the min of the parent.

Above, we're charging 2M to C and D, each of which has 1M min which is
being consumed by the shared charge (the shared part won't get
reclaimed from the internal pressure of children, so we're really
taking that part away from it).  Summing them up, the shared inode
would have 2M protection which is honored as long as B as a whole is
under its 3M limit.  This is similar to creating a dedicated child for
each shared resource for low limits.  The downside is that we end up
guarding the shared inodes more than non-shared ones, but, after all,
we're charging it to everybody who's using it.

Would something like this work?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ