linux-kernel - [tip:sched/urgent] sched: Initialize cfs_rq-> runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <tip-0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b@git.kernel.org>
Date:	Fri, 8 Feb 2013 07:17:47 -0800
From:	tip-bot for Vladimir Davydov <vdavydov@...allels.com>
To:	linux-tip-commits@...r.kernel.org
Cc:	linux-kernel@...r.kernel.org, hpa@...or.com, mingo@...nel.org,
	pjt@...gle.com, peterz@...radead.org, devel@...nvz.org,
	tglx@...utronix.de, vdavydov@...allels.com
Subject: [tip:sched/urgent] sched: Initialize cfs_rq->
 runtime_remaining to non-zero on cfs bw set

Commit-ID:  0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b
Gitweb:     http://git.kernel.org/tip/0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b
Author:     Vladimir Davydov <vdavydov@...allels.com>
AuthorDate: Fri, 8 Feb 2013 11:10:46 +0400
Committer:  Ingo Molnar <mingo@...nel.org>
CommitDate: Fri, 8 Feb 2013 15:14:38 +0100

sched: Initialize cfs_rq->runtime_remaining to non-zero on cfs bw set

If cfs_rq->runtime_remaining is <= 0 then either

 - cfs_rq is throttled and waiting for quota redistribution, or
 - cfs_rq is currently executing and will be throttled on put_prev_entity, or
 - cfs_rq is not throttled and has not executed since its quota was set
   (runtime_remaining is set to 0 on cfs bandwidth reconfiguration).

It is obvious that the last case is rather an exception from the
rule "runtime_remaining<=0 iff cfs_rq is throttled or will be
throttled as soon as it finishes its execution".

Moreover, it can lead to a task hang as follows. If
put_prev_task() is called immediately after first pick_next_task
after quota was set, "immediately" meaning rq->clock in both
functions is the same, then the corresponding cfs_rq will be
throttled.

Besides being unfair (the cfs_rq has not executed in fact), the
quota refilling timer can be idle at that time and it won't be
activated on put_prev_task because update_curr calls
account_cfs_rq_runtime, which activates the timer, only if
delta_exec is strictly positive. As a result we can get a task
"running" inside a throttled cfs_rq which will probably never be
unthrottled.

To avoid the problem, the patch makes tg_set_cfs_bandwidth
initialize runtime_remaining of each cfs_rq to 1 instead of 0 so
that the cfs_rq will be throttled only if it has executed for
some positive number of nanoseconds.

Several times we had our customers encountered such hangs inside
a VM (seems something is wrong or rather different in time
accounting there). Analyzing crash dumps revealed that hung
tasks were running inside cfs_rq's, which had the following
setup:

 cfs_rq->throttled=1
 cfs_rq->runtime_enabled=1
 cfs_rq->runtime_remaining=0
 cfs_rq->tg->cfs_bandwidth.idle=1
 cfs_rq->tg->cfs_bandwidth.timer_active=0

which conforms pretty nice to the explanation given above.

Signed-off-by: Vladimir Davydov <vdavydov@...allels.com>
Cc: <devel@...nvz.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Paul Turner <pjt@...gle.com>
Link: http://lkml.kernel.org/r/1360307446-26978-1-git-send-email-vdavydov@parallels.com
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..c7a078f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7686,7 +7686,7 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)

 		raw_spin_lock_irq(&rq->lock);
 		cfs_rq->runtime_enabled = runtime_enabled;
-		cfs_rq->runtime_remaining = 0;
+		cfs_rq->runtime_remaining = 1;

 		if (cfs_rq->throttled)
 			unthrottle_cfs_rq(cfs_rq);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/