[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170509161740.GD8609@htj.duckdns.org>
Date: Tue, 9 May 2017 12:17:40 -0400
From: Tejun Heo <tj@...nel.org>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
Chris Mason <clm@...com>, kernel-team@...com
Subject: [PATCH v2 for-4.12-fixes 1/2] sched/fair: Use task_groups instead of
leaf_cfs_rq_list to walk all cfs_rqs
Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
live cfs_rqs which have ever been active on the CPU; unfortunately,
this makes update_blocked_averages() O(total number of CPU cgroups)
which isn't scalable at all.
The next patch will make rq->leaf_cfs_rq_list only contain the cfs_rqs
which are currently active. In preparation, this patch converts users
which need to traverse all cfs_rqs to use task_groups list instead.
task_groups list is protected by its own lock and allows RCU protected
traversal and the order of operations guarantees that all online
cfs_rqs will be visited, but holding rq->lock won't protect against
iterating an already unregistered cfs_rq. However, the operations of
the two users that get converted - update_runtime_enabled() and
unthrottle_offline_cfs_rqs() - should be safe to perform on already
dead cfs_rqs, so adding rcu read protection around them should be
enough.
Note that print_cfs_stats() is not converted. The next patch will
change its behavior to print out only active cfs_rqs, which is
intended as there's not much point in printing out idle cfs_rqs.
v2: Dropped strong synchronization around removal and left
print_cfs_stats() unchanged as suggested by Peterz.
Signed-off-by: Tejun Heo <tj@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Mike Galbraith <efault@....de>
Cc: Paul Turner <pjt@...gle.com>
Cc: Chris Mason <clm@...com>
Cc: stable@...r.kernel.org
---
Peter,
Updated the patch - pasted in your version and updated the description
accordingly. Please feel free to use this one or whichever you like.
Based on mainline and stable is cc'd.
Thanks.
kernel/sched/fair.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4644,22 +4644,32 @@ static void destroy_cfs_bandwidth(struct
static void __maybe_unused update_runtime_enabled(struct rq *rq)
{
- struct cfs_rq *cfs_rq;
+ struct task_group *tg;
- for_each_leaf_cfs_rq(rq, cfs_rq) {
- struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
+ lockdep_assert_held(&rq->lock);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(tg, &task_groups, list) {
+ struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+ struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
raw_spin_lock(&cfs_b->lock);
cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
raw_spin_unlock(&cfs_b->lock);
}
+ rcu_read_unlock();
}
static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
{
- struct cfs_rq *cfs_rq;
+ struct task_group *tg;
+
+ lockdep_assert_held(&rq->lock);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(tg, &task_groups, list) {
+ struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
- for_each_leaf_cfs_rq(rq, cfs_rq) {
if (!cfs_rq->runtime_enabled)
continue;
@@ -4677,6 +4687,7 @@ static void __maybe_unused unthrottle_of
if (cfs_rq_throttled(cfs_rq))
unthrottle_cfs_rq(cfs_rq);
}
+ rcu_read_unlock();
}
#else /* CONFIG_CFS_BANDWIDTH */
Powered by blists - more mailing lists