[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1364457537-15114-5-git-send-email-iamjoonsoo.kim@lge.com>
Date: Thu, 28 Mar 2013 16:58:55 +0900
From: Joonsoo Kim <iamjoonsoo.kim@....com>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Mike Galbraith <efault@....de>,
Paul Turner <pjt@...gle.com>, Alex Shi <alex.shi@...el.com>,
Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Morten Rasmussen <morten.rasmussen@....com>,
Namhyung Kim <namhyung@...nel.org>,
Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: [PATCH 4/5] sched: don't consider upper se in sched_slice()
Following-up upper se in sched_slice() should not be done,
because sched_slice() is used for checking that resched is needed
whithin *this* cfs_rq and there is one problem related to this
in current implementation.
The problem is that if we follow-up upper se in sched_slice(), it is
possible that we get a ideal slice which is lower than
sysctl_sched_min_granularity.
For example, we assume that we have 4 tg which is attached to root tg
with same share and each one have 20 runnable tasks on cpu0, respectivly.
In this case, __sched_period() return sysctl_sched_min_granularity * 20
and then go into loop. At first iteration, we compute a portion of slice
for this task on this cfs_rq, so get a slice, sysctl_sched_min_granularity.
Afterward, we enter second iteration and get a slice which is a quarter of
sysctl_sched_min_granularity, because there is 4 tgs with same share
in that cfs_rq.
Ensuring slice larger than min_granularity is important for performance
and there is no lower bound about this, except timer tick, we should
fix it not to consider upper se when calculating sched_slice.
Below is my testing result on my 4 cpus machine.
I did a test for verifying this effect in below environment.
CONFIG_HZ=1000 and CONFIG_SCHED_AUTOGROUP=y
/proc/sys/kernel/sched_min_granularity_ns is 2250000, that is, 2.25ms.
Did following command.
For each 4 sessions,
for i in `seq 20`; do taskset -c 3 sh -c 'while true; do :; done' & done
./perf sched record
./perf script -C 003 | grep sched_switch | cut -b -40 | less
Result is below.
*Vanilla*
sh 2724 [003] 152.52801
sh 2779 [003] 152.52900
sh 2775 [003] 152.53000
sh 2751 [003] 152.53100
sh 2717 [003] 152.53201
*With this patch*
sh 2640 [003] 147.48700
sh 2662 [003] 147.49000
sh 2601 [003] 147.49300
sh 2633 [003] 147.49400
In vanilla case, min_granularity is lower than 1ms, so every tick trigger
reschedule. After patch appied, we can see min_granularity is ensured.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 204a9a9..e232421 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -631,23 +631,20 @@ static u64 __sched_period(unsigned long nr_running)
*/
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+ struct load_weight *load;
+ struct load_weight lw;
u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
- for_each_sched_entity(se) {
- struct load_weight *load;
- struct load_weight lw;
-
- cfs_rq = cfs_rq_of(se);
- load = &cfs_rq->load;
+ load = &cfs_rq->load;
- if (unlikely(!se->on_rq)) {
- lw = cfs_rq->load;
+ if (unlikely(!se->on_rq)) {
+ lw = cfs_rq->load;
- update_load_add(&lw, se->load.weight);
- load = &lw;
- }
- slice = calc_delta_mine(slice, se->load.weight, load);
+ update_load_add(&lw, se->load.weight);
+ load = &lw;
}
+ slice = calc_delta_mine(slice, se->load.weight, load);
+
return slice;
}
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists