linux-kernel - Re: [PATCH 1/3] sched: fix cgroup movement of newly created process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 13 Dec 2011 04:01:21 -0800
From:	Paul Turner <pjt@...gle.com>
To:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	cgroups <cgroups@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 1/3] sched: fix cgroup movement of newly created process

On 12/12/2011 10:57 PM, Daisuke Nishimura wrote:

> There is a small race between do_fork() and sched_move_task(), which is trying
> to move the child.
>
>             do_fork()                 sched_move_task()
> --------------------------------+---------------------------------
>   copy_process()
>     sched_fork()
>       task_fork_fair()
>         -> vruntime of the child is initialized
>            based on that of the parent.


Hmm, so here vruntime of child is actually initialized to vruntime - min_V

>   -> we can see the child in "tasks" file now.
>                                     task_rq_lock()
>                                     task_move_group_fair()


So since on a regular fork we just copy and don't actually go through
the attach muck I'm assuming this is an external actor who's seen the
child in the tasks file and is moving it?

>                                       ->child.se.vruntime -= (old)cfs_rq->min_vruntime
>                                       ->child.se.vruntime += (new)cfs_rq->min_vruntime


This would then add delta min_V between the two cfs_rqs

>                                     task_rq_unlock()
>   wake_up_new_task()
>     ...
>     enqueue_entity()
>       child.se->vruntime += cfs_rq->min_vruntime

>
> As a result, vruntime of the child becomes far bigger than min_vruntime,
> if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.
>
> This patch fixes this problem by just ignoring such process in task_move_group_fair(),
> because the vruntime has already been normalized in task_fork_fair().
>
> Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@...lic.gmane.org>
> ---This would need an explanatory
>  kernel/sched_fair.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 5c9e679..df145a9 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -4922,10 +4922,10 @@ static void task_move_group_fair(struct task_struct *p, int on_rq)
>  	 * to another cgroup's rq. This does somewhat interfere with the
>  	 * fair sleeper stuff for the first placement, but who cares.
>  	 */
> -	if (!on_rq)
> +	if (!on_rq && p->state != TASK_RUNNING)
>  		p->se.vruntime -= cfs_rq_of(&p->se)->min_vruntime;


We also have the choice of keying off something like
!se.sum_exec_runtime here which is reset in sched_fork() which might
be less fragile/make the problem interaction more obvious to.  Either
way this is really a corner case deserving of an explanatory comment.
This is a little icky but I don't have any wildly better ideas.

>  	set_task_rq(p, task_cpu(p));
> -	if (!on_rq)
> +	if (!on_rq && p->state != TASK_RUNNING)
>  		p->se.vruntime += cfs_rq_of(&p->se)->min_vruntime;
>  }
>  #endif


Acked-by: Paul Turner <pjt@...gle.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/