linux-kernel - [PATCH] sched/fair: adjust the depth of a sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABAubThfDMnA8g5Fxdwwu79V9sEDgQYvvWY675757LZXnyMcKQ@mail.gmail.com>
Date:	Mon, 14 Sep 2015 22:32:42 -0700
From:	Shayan Pooya <shayan@...eve.org>
To:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: [PATCH] sched/fair: adjust the depth of a sched_entity when its
 parent changes

>From 64a24d04c6510dcc144aba123fb21ed6f895c6b7 Mon Sep 17 00:00:00 2001
From: Shayan Pooya <shayan@...eve.org>
Date: Mon, 14 Sep 2015 21:25:09 -0700
Subject: [PATCH] sched/fair: adjust the depth of a sched_entity when its
 parent changes

Fixes commit fed14d45f945 ("sched/fair: Track cgroup depth")
Hit this kernel panic mentioned in https://lkml.org/lkml/2014/2/15/217
when running docker with kernel 3.16.

The issue has been reported other places including:

https://github.com/docker/docker/issues/13940
https://gist.github.com/burke/c60dc5b8f0ba9bfd9275

The latter also has an analysis and a similar patch (which was never
submitted to lkml).

Looking into the panic (RIP: check_preempt_wakeup+255) and the code:
  <check_preempt_wakeup+248>:  mov    0x148(%rbx),%rbx
  <check_preempt_wakeup+255>:  mov    0x150(%r12),%rdi
  <check_preempt_wakeup+263>:  cmp    0x150(%rbx),%rdi

And:
  crash> p &((struct sched_entity *)0)->cfs_rq
  $10 = (struct cfs_rq **) 0x150

Which suggests the inlined function find_matching_se and the while loop
in it. Looking into the task that was about to get scheduled in the
check_preempt_wakeup function:

  crash> p ((struct task_struct *) 0xffff8808506fd180)->se.depth
  $2 = 1
  crash> p ((struct task_struct *) 0xffff8808506fd180)->se.parent
  $3 = (struct sched_entity *) 0xffff8808533c0c00
  crash> p ((struct task_struct *) 0xffff8808506fd180)->se.parent->depth
  $4 = 1

Which is incorrect and the root-cause of the panic.
The modified code is the only place that the depth was not adjusted after
potentially modifying the parent.

Signed-off-by: Shayan Pooya <shayan@...eve.org>
---
 kernel/sched/fair.c  | 1 -
 kernel/sched/sched.h | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e2e348..ced5534 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8035,7 +8035,6 @@ static void task_move_group_fair(struct
task_struct *p, int queued)
     if (!queued)
         se->vruntime -= cfs_rq_of(se)->min_vruntime;
     set_task_rq(p, task_cpu(p));
-    se->depth = se->parent ? se->parent->depth + 1 : 0;
     if (!queued) {
         cfs_rq = cfs_rq_of(se);
         se->vruntime += cfs_rq->min_vruntime;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 68cda11..507d30f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -931,6 +931,7 @@ static inline void set_task_rq(struct task_struct
*p, unsigned int cpu)
 #ifdef CONFIG_FAIR_GROUP_SCHED
     p->se.cfs_rq = tg->cfs_rq[cpu];
     p->se.parent = tg->se[cpu];
+    p->se.depth = p->se.parent ? p->se.parent->depth + 1 : 0;
 #endif

 #ifdef CONFIG_RT_GROUP_SCHED
-- 
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/