[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170504133122.a6qjlj3hlblbjxux@hirez.programming.kicks-ass.net>
Date: Thu, 4 May 2017 15:31:22 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Tejun Heo <tj@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
“linux-kernel@...r.kernel.org”
<linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
Chris Mason <clm@...com>,
“kernel-team@...com” <kernel-team@...com>
Subject: Re: [PATCH 1/2] sched/fair: Use task_groups instead of
leaf_cfs_rq_list to walk all cfs_rqs
On Tue, Apr 25, 2017 at 05:40:39PM -0700, Tejun Heo wrote:
> task_groups list is protected by its own lock. While it allows RCU
> protected traversal and the order of operations guarantees that all
> online cfs_rqs will be visited, holding rq->lock won't protect against
> iterating an already unregistered cfs_rq.
So I think the below is sufficient.
Yes we can hit an (almost) dead cfs_rq, but poking the bandwidth
variables thereof is harmless.
unthrottle_cfs_rq() also stops doing anything much when it finds the
cfs_rq is empty, which must be the case if we're removing it.
I don't know Paul's opinion on RCU GPs happening while stop_machine(),
but just in case he feels that's fair game, I did add the
rcu_read_lock() thingies.
The lockdep assert is mostly documentation, to more easily see
it is indeed held when we get there.
I left print_cfs_stats using the leaf list, no point in printing stuff
that's empty.
This way we can avoid taking all RQ locks on cgroup destruction.
---
kernel/sched/fair.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd6c3f957d44..c897ad6d714c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4644,22 +4644,32 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
static void __maybe_unused update_runtime_enabled(struct rq *rq)
{
- struct cfs_rq *cfs_rq;
+ struct task_group *tg;
- for_each_leaf_cfs_rq(rq, cfs_rq) {
- struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
+ lockdep_assert_held(&rq->lock);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(tg, &task_groups, list) {
+ struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+ struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
raw_spin_lock(&cfs_b->lock);
cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
raw_spin_unlock(&cfs_b->lock);
}
+ rcu_read_unlock();
}
static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
{
- struct cfs_rq *cfs_rq;
+ struct task_group *tg;
+
+ lockdep_assert_held(&rq->lock);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(tg, &task_groups, list) {
+ struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
- for_each_leaf_cfs_rq(rq, cfs_rq) {
if (!cfs_rq->runtime_enabled)
continue;
@@ -4677,6 +4687,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
if (cfs_rq_throttled(cfs_rq))
unthrottle_cfs_rq(cfs_rq);
}
+ rcu_read_unlock();
}
#else /* CONFIG_CFS_BANDWIDTH */
Powered by blists - more mailing lists