lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 10 Aug 2015 15:08:59 +0900
From:	byungchul.park@....com
To:	mingo@...nel.org, peterz@...radead.org
Cc:	linux-kernel@...r.kernel.org,
	Byungchul Park <byungchul.park@....com>
Subject: [PATCH] sched: sync with the prev cfs when changing cgroup within a cpu

From: Byungchul Park <byungchul.park@....com>

current code seems to be wrong with cfs_rq->blocked_load_avg when changing
a task's cgroup(=cfs_rq) to another. i tested with "echo pid > cgroup" and
found that cfs_rq->blocked_load_avg became larger and larger whenever i
changed a cgroup to another again and again.

it is possible to move between groups within a cpu, and each cfs_rq is
tracking its own blocked load. so we have to sync se's average load with
both *prev* cfs_rq and next cfs_rq when changing its group.

in addition, "#ifdef CONFIG_SMP" is removed becasuse we need to sync a
se's load with its cfs_rq even in the case of !SMP. remember it is possible
to move between groups in *a* cpu.

i also removed some comments mentioning migration_task_rq_fair().
migration_task_rq_fair() can be called in three cases. and in each case,
both decay counter and blocked load are already considered. so we
don't need to consider these in task_move_group_fair() at all.

1. the wake-up migration case
enqueue_entity_load_avg() makes se->avg.decay_count zero after applying it.
and it will be woken up soon so we don't need to add its load to
cfs_rq->blocked_load_avg.

2. the fork balancing case
se->avg.decay_count is initialized in __sched_fork() to zero. and
wake_up_new_task() calls activate_task() with flag = 0 so that
enqueue_entity_load_avg() can omit adding its load to
cfs_rq->blocked_load_avg, and it will be woken up soon.

3. the rq migration case (not wake up case)
the target task is already on rq, so we don't need to consider both its
decay counter and blocked load in this case.

Signed-off-by: Byungchul Park <byungchul.park@....com>
---
 kernel/sched/fair.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ffa70dc..f8ab2ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8229,22 +8229,27 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 	if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
 		queued = 1;
 
-	if (!queued)
-		se->vruntime -= cfs_rq_of(se)->min_vruntime;
+	if (!queued) {
+		cfs_rq = cfs_rq_of(se);
+		se->vruntime -= cfs_rq->min_vruntime;
+
+		/*
+		 * we must synchronize with the prev cfs.
+		 */
+		__synchronize_entity_decay(se);
+		subtract_blocked_load_contrib(cfs_rq, se->avg.load_avg_contrib);
+	}
 	set_task_rq(p, task_cpu(p));
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 	if (!queued) {
 		cfs_rq = cfs_rq_of(se);
 		se->vruntime += cfs_rq->min_vruntime;
-#ifdef CONFIG_SMP
+
 		/*
-		 * migrate_task_rq_fair() will have removed our previous
-		 * contribution, but we must synchronize for ongoing future
-		 * decay.
+		 * we must synchronize with the next cfs for ongoing future decay.
 		 */
 		se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
 		cfs_rq->blocked_load_avg += se->avg.load_avg_contrib;
-#endif
 	}
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ