lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Sep 2015 11:20:00 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Shayan Pooya <shayan@...eve.org>
Cc:	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
	byungchul.park@....com, burke@...bey.me
Subject: Re: [PATCH] sched/fair: adjust the depth of a sched_entity when its
 parent changes

On Mon, Sep 14, 2015 at 10:32:42PM -0700, Shayan Pooya wrote:

> Fixes commit fed14d45f945 ("sched/fair: Track cgroup depth")
> Hit this kernel panic mentioned in https://lkml.org/lkml/2014/2/15/217
> when running docker with kernel 3.16.

v3.16 includes the fix from that thread (and I had to look in my own
archives, because lkml.org fancies showing blank pages today :/).

> The issue has been reported other places including:
> 
> https://github.com/docker/docker/issues/13940
> https://gist.github.com/burke/c60dc5b8f0ba9bfd9275
> 
> The latter also has an analysis and a similar patch (which was never
> submitted to lkml).

Pretty good write up that, sad you did not Cc the guy.

I got defeated by the github web shite (again!) and could not locate an
email address for him :( Ah.. Google to the rescue!

> Which suggests the inlined function find_matching_se and the while loop
> in it. Looking into the task that was about to get scheduled in the
> check_preempt_wakeup function:
>
>   crash> p ((struct task_struct *) 0xffff8808506fd180)->se.depth
>   $2 = 1
>   crash> p ((struct task_struct *) 0xffff8808506fd180)->se.parent->depth
>   $4 = 1

Yep, buggered.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e2e348..ced5534 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8035,7 +8035,6 @@ static void task_move_group_fair(struct
> task_struct *p, int queued)
>      if (!queued)
>          se->vruntime -= cfs_rq_of(se)->min_vruntime;
>      set_task_rq(p, task_cpu(p));
> -    se->depth = se->parent ? se->parent->depth + 1 : 0;
>      if (!queued) {
>          cfs_rq = cfs_rq_of(se);
>          se->vruntime += cfs_rq->min_vruntime;

So at this point I'm left wondering about that depth update we have in
switched_to_fair().

Which leads me to suggest the following (note that some of this code has
_just_ changed a lot).

Does that work for you? (not been near a compiler).

---
 kernel/sched/fair.c  | 10 +---------
 kernel/sched/sched.h |  1 +
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9176f7c588a8..fc3ef8fb6891 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8000,13 +8000,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	/*
-	 * Since the real-depth could have been changed (only FAIR
-	 * class maintain depth value), reset depth properly.
-	 */
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-#endif
+	set_task_rq(p, task_cpu(p));
 
 	/* Synchronize task with its cfs_rq */
 	attach_entity_load_avg(cfs_rq, se);
@@ -8072,8 +8066,6 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 static void task_move_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
-	set_task_rq(p, task_cpu(p));
-
 #ifdef CONFIG_SMP
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 167ab4844ee6..dde8881f16bc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -931,6 +931,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	p->se.cfs_rq = tg->cfs_rq[cpu];
 	p->se.parent = tg->se[cpu];
+	p->se.depth = p->se.parent ? p->se.parent->depth + 1 : 0;
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ