lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 May 2022 18:34:39 +0200
From:   Michal Koutný <mkoutny@...e.com>
To:     Roman Gushchin <roman.gushchin@...ux.dev>
Cc:     Vasily Averin <vvs@...nvz.org>, Vlastimil Babka <vbabka@...e.cz>,
        Shakeel Butt <shakeelb@...gle.com>, kernel@...nvz.org,
        Florian Westphal <fw@...len.de>, linux-kernel@...r.kernel.org,
        Michal Hocko <mhocko@...e.com>, cgroups@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tejun Heo <tj@...nel.org>
Subject: Re: kernfs memcg accounting

On Tue, May 10, 2022 at 08:06:24PM -0700, Roman Gushchin <roman.gushchin@...ux.dev> wrote:
> My primary goal was to apply the memory pressure on memory cgroups with a lot
> of (dying) children cgroups. On a multi-cpu machine a memory cgroup structure
> is way larger than a page, so a cgroup which looks small can be really large
> if we calculate the amount of memory taken by all children memcg internals.
> 
> Applying this pressure to another cgroup (e.g. the one which contains systemd)
> doesn't help to reclaim any pages which are pinning the dying cgroups.

Just a note -- this another usecase of cgroups created from within the
subtree (e.g. a container). I agree that cgroup-manager/systemd case is
also valid (as dying memcgs may accumulate after a restart).

memcgs with their retained state with footprint are special.

> For other controllers (maybe blkcg aside, idk) it shouldn't matter, because
> there is no such problem there.
> 
> For consistency reasons I'd suggest to charge all *large* allocations
> (e.g. percpu) to the parent cgroup. Small allocations can be ignored.

Strictly speaking, this would mean that any controller would have on
implicit dependency on the memory controller (such as io controller
has).
In the extreme case even controller-less hierarchy would have such a
requirement (for precise kernfs_node accounting).
Such a dependency is not enforceable on v1 (with various topologies of
different hierarchies).
Although, I initially favored the consistency with memory controller too,
I think it's simpler to charge to the creator's memcg to achieve
consistency across v1 and v2 :-) 

Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ