linux-kernel - Re: [PATCH 2/2] sched: Implement interface for cgroup unified hierarchy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170801201745.GA2311718@devbig577.frc2.facebook.com>
Date:   Tue, 1 Aug 2017 13:17:45 -0700
From:   Tejun Heo <tj@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     lizefan@...wei.com, hannes@...xchg.org, mingo@...hat.com,
        longman@...hat.com, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, kernel-team@...com, pjt@...gle.com,
        luto@...capital.net, efault@....de, torvalds@...ux-foundation.org,
        guro@...com
Subject: Re: [PATCH 2/2] sched: Implement interface for cgroup unified
 hierarchy

Hello, Peter.

On Sat, Jul 29, 2017 at 11:17:07AM +0200, Peter Zijlstra wrote:
> > * "cpu.shares" is replaced with "cpu.weight" and operates on the
> >   standard scale defined by CGROUP_WEIGHT_MIN/DFL/MAX (1, 100, 10000).
> >   The weight is scaled to scheduler weight so that 100 maps to 1024
> >   and the ratio relationship is preserved - if weight is W and its
> >   scaled value is S, W / 100 == S / 1024.  While the mapped range is a
> >   bit smaller than the orignal scheduler weight range, the dead zones
> >   on both sides are relatively small and covers wider range than the
> >   nice value mappings.  This file doesn't make sense in the root
> >   cgroup and isn't create on root.
> 
> s/create/&d/

Updated, thanks.

> > * "cpu.rt_runtime_us" and "cpu.rt_period_us" are replaced by
> >   "cpu.rt.max" which contains both runtime and period.
> 
> So we've been looking at overhauling the whole RT stuff. But sadly we've
> not been able to find something that works with all the legacy
> constraints (like RT tasks having arbitrary affinities).
> 
> Lets just hope we can preserve this interface :/

Ah, should have dropped this from the description.  Yeah, we can wait
till the RT side settles down and go for a better matching interface
as necessary.

> > v3: - Added "cpu.weight.nice" to allow using nice values when
> >       configuring the weight.  The feature is requested by PeterZ.
> >     - Merge the patch to enable threaded support on cpu and cpuacct.
> 
> >     - Dropped the bits about getting rid of cpuacct from patch
> >       description as there is a pretty strong case for making cpuacct
> >       an implicit controller so that basic cpu usage stats are always
> >       available.
> 
> What about the whole double accounting thing? Because currently cpuacct
> and cpu do a fair bit of duplication. It would be very good to get rid
> of that.

I'm not that sure at this point.  Here are my current thoughts on
cpuacct.

* It is useful to have basic cpu statistics on cgroup without having
  to enable the cpu controller, especially because enabling cpu
  controller always changes how cpu cycles are distributed and
  currently comes at some performance overhead.

* On cgroup2, there is only one hierarchy.  It'd be great to have
  basic resource accounting enabled by default on all cgroups.  Note
  that we couldn't do that on v1 because there could be any number of
  hierarchies and the cost would increase with the number of
  hierarchies.

* It is bothersome that we're walking up the tree each time for
  cpuacct although being percpu && just walking up the tree makes it
  relatively cheap.  Anyways, I'm thinking about shifting the
  aggregation to the reader side so that the hot path always only
  updates local counters in a way which can scale even when there are
  a lot of (idle) cgroups.  Will follow up on this later.

Thanks.

-- 
tejun