[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170801214038.sp4lw6lrnl3votte@hirez.programming.kicks-ass.net>
Date: Tue, 1 Aug 2017 23:40:38 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Tejun Heo <tj@...nel.org>
Cc: lizefan@...wei.com, hannes@...xchg.org, mingo@...hat.com,
longman@...hat.com, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...com, pjt@...gle.com,
luto@...capital.net, efault@....de, torvalds@...ux-foundation.org,
guro@...com
Subject: Re: [PATCH 2/2] sched: Implement interface for cgroup unified
hierarchy
On Tue, Aug 01, 2017 at 01:17:45PM -0700, Tejun Heo wrote:
> > What about the whole double accounting thing? Because currently cpuacct
> > and cpu do a fair bit of duplication. It would be very good to get rid
> > of that.
>
> I'm not that sure at this point. Here are my current thoughts on
> cpuacct.
>
> * It is useful to have basic cpu statistics on cgroup without having
> to enable the cpu controller, especially because enabling cpu
> controller always changes how cpu cycles are distributed and
> currently comes at some performance overhead.
>
> * On cgroup2, there is only one hierarchy. It'd be great to have
> basic resource accounting enabled by default on all cgroups. Note
> that we couldn't do that on v1 because there could be any number of
> hierarchies and the cost would increase with the number of
> hierarchies.
Yes, the whole single hierarchy thing makes doing away with the double
accounting possible.
> * It is bothersome that we're walking up the tree each time for
> cpuacct although being percpu && just walking up the tree makes it
> relatively cheap.
So even if its only CPU local accounting, you still have all the pointer
chasing and misses, not to mention that a faster O(depth) is still
O(depth).
> Anyways, I'm thinking about shifting the
> aggregation to the reader side so that the hot path always only
> updates local counters in a way which can scale even when there are
> a lot of (idle) cgroups. Will follow up on this later.
Not entirely sure I follow, we currently only update the current cgroup
and its immediate parents, no? Or are you looking to only account into
the current cgroup and propagate into the parents on reading?
Powered by blists - more mailing lists