lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170829152426.GL491396@devbig577.frc2.facebook.com>
Date:   Tue, 29 Aug 2017 08:24:27 -0700
From:   Tejun Heo <tj@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     lizefan@...wei.com, hannes@...xchg.org, mingo@...hat.com,
        longman@...hat.com, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, kernel-team@...com, pjt@...gle.com,
        luto@...capital.net, efault@....de, torvalds@...ux-foundation.org,
        guro@...com
Subject: Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting

Hello, Peter.

On Tue, Aug 29, 2017 at 04:32:52PM +0200, Peter Zijlstra wrote:
> So I mostly like. On accounting it only adds to the immediate cgroup (if
> it has a parent, aka !root).
> 
> On update it does a DFS of all sub-groups and propagates the deltas up
> to the requested group.
...
> What I don't get is why you need cgroup_cpu_stat_updated(). That is, I
> see you use it to keep the keep the DFS 'stack' up-to-date, but what I
> don't see is why you'd need that.

That is to make reading stats O(number of descendants which have been
active since last read) instad of O(number of all descendants) as
there can be a lot of not-too-active cgroups in a system.  Stat
reading can be frequent, so the combination can get really bad.  By
keeping the updated list separate, increasing read frequency decreases
the cost of each read.

Also, please note that a system may end up with a lot of cgroups
without the user intending to.  memcg drains removed cgroups lazily
and the number of draining cgroups can reach very high numbers if the
system isn't under memory pressure.  The plan is to add basic stats
for other resources too and keeping it scalable w.r.t. idle cgroups
allows using the same mechanism for all resources.

> Have a look at walk_tg_tree_from(), I think we can do something like
> that on struct cgroup_subsys_state, it has that children list and the
> parent pointer.
> 
> And yes, walk_tg_tree_from() is tricky, it always takes a fair while to
> remember how it works.

We can propagate "updated" flag up the tree (we need to, otherwise we
can't tell which subtree to descend into) and prune the iteration on
subtrees which haven't been updated; however, this can still become
very costly depending on the topology as it can't jump over the
siblings which haven't been updated.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ