linux-kernel - Re: [PATCH 4/4] sched,fair: Fix PELT integrity for new tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAUtteP0V5-1u2n0YFDdFbZfwSqigHjOfM32Vbw3iAPbg@mail.gmail.com>
Date:	Fri, 17 Jun 2016 16:09:01 +0200
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Yuyang Du <yuyang.du@...el.com>, Ingo Molnar <mingo@...nel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Benjamin Segall <bsegall@...gle.com>,
	Paul Turner <pjt@...gle.com>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Matt Fleming <matt@...eblueprint.co.uk>
Subject: Re: [PATCH 4/4] sched,fair: Fix PELT integrity for new tasks

On 17 June 2016 at 14:01, Peter Zijlstra <peterz@...radead.org> wrote:
> Vincent and Yuyang found another few scenarios in which entity
> tracking goes wobbly.
>
> The scenarios are basically due to the fact that new tasks are not
> immediately attached and thereby differ from the normal situation -- a
> task is always attached to a cfs_rq load average (such that it
> includes its blocked contribution) and are explicitly
> detached/attached on migration to another cfs_rq.
>
> Scenario 1: switch to fair class
>
>   p->sched_class = fair_class;
>   if (queued)
>     enqueue_task(p);
>       ...
>         enqueue_entity()
>           enqueue_entity_load_avg()
>             migrated = !sa->last_update_time (true)
>             if (migrated)
>               attach_entity_load_avg()
>   check_class_changed()
>     switched_from() (!fair)
>     switched_to()   (fair)
>       switched_to_fair()
>         attach_entity_load_avg()
>
> If @p is a new task that hasn't been fair before, it will have
> !last_update_time and, per the above, end up in
> attach_entity_load_avg() _twice_.
>
> Scenario 2: change between cgroups
>
>   sched_move_group(p)
>     if (queued)
>       dequeue_task()
>     task_move_group_fair()
>       detach_task_cfs_rq()
>         detach_entity_load_avg()
>       set_task_rq()
>       attach_task_cfs_rq()
>         attach_entity_load_avg()
>     if (queued)
>       enqueue_task();
>         ...
>           enqueue_entity()
>             enqueue_entity_load_avg()
>               migrated = !sa->last_update_time (true)
>               if (migrated)
>                 attach_entity_load_avg()
>
> Similar as with scenario 1, if @p is a new task, it will have
> !load_update_time and we'll end up in attach_entity_load_avg()
> _twice_.
>
> Furthermore, notice how we do a detach_entity_load_avg() on something
> that wasn't attached to begin with.
>
> As stated above; the problem is that the new task isn't yet attached
> to the load tracking and thereby violates the invariant assumption.
>
> This patch remedies this by ensuring a new task is indeed properly
> attached to the load tracking on creation, through
> post_init_entity_util_avg().
>
> Of course, this isn't entirely as straight forward as one might think,
> since the task is hashed before we call wake_up_new_task() and thus
> can be poked at. We avoid this by adding TASK_NEW and teaching
> cpu_cgroup_can_attach() to refuse such tasks.
>
> Cc: Yuyang Du <yuyang.du@...el.com>
> Reported-by: Vincent Guittot <vincent.guittot@...aro.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
...
>
> +static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
> +static int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq);
> +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se);
> +
>  /*
>   * With new tasks being created, their initial util_avgs are extrapolated
>   * based on the cfs_rq's current util_avg:
> @@ -733,18 +737,21 @@ void post_init_entity_util_avg(struct sc
>                 }
>                 sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>         }
> +
> +       update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, false);
> +       attach_entity_load_avg(cfs_rq, se);

A new RT task will be attached and will contribute to the load until
it decays to 0
Should we detach it for non cfs task ? We just want to update
last_update_time of RT task to something different from 0

>  }
>
>  static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
>  static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
> -#else
> +#else /* !CONFIG_SMP */
>  void init_entity_runnable_average(struct sched_entity *se)
>  {
>  }
>  void post_init_entity_util_avg(struct sched_entity *se)
>  {
>  }
> -#endif
> +#endif /* CONFIG_SMP */
>
>  /*
>   * Update the current task's runtime statistics.
> @@ -2847,8 +2854,6 @@ void set_task_rq_fair(struct sched_entit
>  static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) {}
>  #endif /* CONFIG_FAIR_GROUP_SCHED */
>
> -static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
> -
>  static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
>  {
>         struct rq *rq = rq_of(cfs_rq);
> @@ -2958,6 +2963,8 @@ static void attach_entity_load_avg(struc
>         /*
>          * If we got migrated (either between CPUs or between cgroups) we'll
>          * have aged the average right before clearing @last_update_time.
> +        *
> +        * Or we're fresh through post_init_entity_util_avg().
>          */
>         if (se->avg.last_update_time) {
>                 __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> @@ -3063,11 +3070,14 @@ void remove_entity_load_avg(struct sched
>         u64 last_update_time;
>
>         /*
> -        * Newly created task or never used group entity should not be removed
> -        * from its (source) cfs_rq
> +        * tasks cannot exit without having gone through wake_up_new_task() ->
> +        * post_init_entity_util_avg() which will have added things to the
> +        * cfs_rq, so we can remove unconditionally.
> +        *
> +        * Similarly for groups, they will have passed through
> +        * post_init_entity_util_avg() before unregister_sched_fair_group()
> +        * calls this.
>          */
> -       if (se->avg.last_update_time == 0)
> -               return;
>
>         last_update_time = cfs_rq_last_update_time(cfs_rq);
>
>
>