linux-kernel - [PATCH v2 0/6] CFS Bandwidth Control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100428110720.7954.53537.stgit@kitami.corp.google.com>
Date:	Wed, 28 Apr 2010 04:16:46 -0700
From:	Paul Turner <pjt@...gle.com>
To:	linux-kernel@...r.kernel.org
Cc:	Paul Menage <menage@...gle.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	Gautham R Shenoy <ego@...ibm.com>,
	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
	Herbert Poetzl <herbert@...hfloor.at>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Chris Friesen <cfriesen@...tel.com>,
	Avi Kivity <avi@...hat.com>,
	Bharata B Rao <bharata@...ux.vnet.ibm.com>,
	Nikhil Rao <ncrao@...gle.com>, Ingo Molnar <mingo@...e.hu>,
	Pavel Emelyanov <xemul@...nvz.org>,
	Mike Waychison <mikew@...gle.com>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: [PATCH v2 0/6] CFS Bandwidth Control

Hi all,

Please find attached v2 of our proposed approach for bandwidth provisioning
under CFS.  Bharata's original RFC motivating discussion on this topic can be
found at: http://lkml.org/lkml/2009/6/4/24

This is an evolution of our previous posting: http://lkml.org/lkml/2010/2/12/393
The improvements herein are incremental: hierarchal task tracking for better
load-balance under throttle conditions, statistics export for decision
guidance in user-space control systems, minor bugs fixed, and some code
clean-up.

The skeleton of our approach is as follows:
- As above we maintain a global pool, per-tg, pool of unassigned quota.  On it
  we track the bandwidth period, quota per period, and runtime remaining in the
  current period.  As bandwidth is used within a period it is decremented from
  runtime.  Runtime is currently synchronized using a spinlock, in the current
  implementation there's no reason this couldn't be done using atomic ops
  instead however the spinlock allows for a little more flexibility in
  experimentation with other schemes.
- When a cfs_rq participating in a bandwidth constrained task_group executes it
  acquires time in sysctl_sched_cfs_bandwidth_slice (default currently 10ms)
  size chunks from the global pool, this synchronizes under rq->lock and is part
  of the update_curr path.
- Throttled entities are dequeued immediately.  Throttled entities are gated
  from participating in the tree at the {enqueue, dequeue}_entity level.

More details on the motivation and approach, as well as performance benchmark
results can be found in the original posting.

One caveat that bears discussion is that this leads to an alternate
specification of bandwidth versus the sched_rt case.  The defined bandwidth
becomes an absolute quantifier relative to the period and is agnostic of allowed
cpus.

Open-questions:
- Is there any value in having the slice be tunable at the task-group level?
- I suspect 5ms may be a better default slice value, however I have not had the
  opportunity to verify this yet.  There's also room for some dynamic range
  here.

Acknowledgements: 
We would like to thank Bharata B Rao and Dhaval Giani for discussion and their
original proposal, many elements in this patchset are directly inspired by
their original posting.  Bharata has also been integral in the preparation of
this second version, providing valuable feedback and review.

Ken Chen also provided early review and comments.

Thanks,

- Paul and Nikhil
---

Nikhil Rao (1):
      sched: add exports tracking cfs bandwidth control statistics

Paul Turner (5):
      sched: introduce primitives to account for CFS bandwidth tracking
      sched: accumulate per-cfs_rq cpu usage
      sched: throttle cfs_rq entities which exceed their local quota
      sched: unthrottle cfs_rq(s) who ran out of quota at period refresh
      sched: hierarchical task accounting for FAIR_GROUP_SCHED


 include/linux/sched.h |    4 +
 init/Kconfig          |    9 +
 kernel/sched.c        |  347 +++++++++++++++++++++++++++++++++++++++++++++----
 kernel/sched_fair.c   |  240 +++++++++++++++++++++++++++++++++-
 kernel/sched_rt.c     |   24 +--
 kernel/sysctl.c       |   10 +
 6 files changed, 585 insertions(+), 49 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/