linux-kernel - Re: [patch 06/16] sched: account for blocked load waking back up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <xm26fw6xd7cu.fsf@sword-of-the-dawn.mtv.corp.google.com>
Date:	Tue, 04 Sep 2012 10:29:37 -0700
From:	Benjamin Segall <bsegall@...gle.com>
To:	Preeti Murthy <preeti.lkml@...il.com>
Cc:	pjt@...gle.com, linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
	Venki Pallipadi <venki@...gle.com>,
	Mike Galbraith <efault@....de>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [patch 06/16] sched: account for blocked load waking back up

Preeti Murthy <preeti.lkml@...il.com> writes:

> Hi Paul,
>
>     @@ -1170,20 +1178,42 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
>                                                       struct sched_entity *se,
>                                                       int wakeup)
>      {
>     -       /* we track migrations using entity decay_count == 0 */
>     -       if (unlikely(!se->avg.decay_count)) {
>     +       /*
>     +        * We track migrations using entity decay_count <= 0, on a wake-up
>     +        * migration we use a negative decay count to track the remote decays
>     +        * accumulated while sleeping.
>     +        */
>     +       if (unlikely(se->avg.decay_count <= 0)) {
>                     se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
>     +               if (se->avg.decay_count) {
>     +                       /*
>     +                        * In a wake-up migration we have to approximate the
>     +                        * time sleeping.  This is because we can't synchronize
>     +                        * clock_task between the two cpus, and it is not
>     +                        * guaranteed to be read-safe.  Instead, we can
>     +                        * approximate this using our carried decays, which are
>     +                        * explicitly atomically readable.
>     +                        */
>     +                       se->avg.last_runnable_update -= (-se->avg.decay_count)
>     +                                                       << 20;
>     +                       update_entity_load_avg(se, 0);
>     +                       /* Indicate that we're now synchronized and on-rq */
>     +                       se->avg.decay_count = 0;
>     +               }
>                     wakeup = 0;
>             } else {
>                     __synchronize_entity_decay(se);
>
>  
> Should not the last_runnable_update of se get updated in __synchronize_entity_decay()?
> Because it contains the value of the runnable update before going to sleep.If not updated,when
> update_entity_load_avg() is called below during a local wakeup,it will decay the runtime load
> for the duration including the time the sched entity has slept.

If you are asking if it should be updated in the else block (local
wakeup, no migration) here, no:

* __synchronize_entity_decay will decay load_avg_contrib to match the
  decay that the cfs_rq has done, keeping those in sync, and ensuring we
  don't subtract too much when we update our current load average.
* clock_task - last_runnable_update will be the amount of time that the
  task has been blocked. update_entity_load_avg (below) and
  __update_entity_runnable_avg will account this time as non-runnable
  time into runnable_avg_sum/period, and from there onto the cfs_rq via
  __update_entity_load_avg_contrib.

Both of these are necessary, and will happen. In the case of !wakeup,
the task is being moved between groups or is migrating between cpus, and
we pretend (to the best of our ability in the case of migrating between
cpus which may have different clock_tasks) that the task has been
runnable this entire time.

In the more general case, no, it is called from migrate_task_rq_fair,
which doesn't have the necessary locks to read clock_task.

>
> This also means that during dequeue_entity_load_avg(),update_entity_load_avg() needs to be
> called to keep the runnable_avg_sum of the sched entity updated till
> before sleep.

Yes, this happens first thing in dequeue_entity_load_avg.
>
>             }
>    
>     -       if (wakeup)
>     +       /* migrated tasks did not contribute to our blocked load */
>     +       if (wakeup) {
>                     subtract_blocked_load_contrib(cfs_rq, se->avg.load_avg_contrib);
>     +               update_entity_load_avg(se, 0);
>     +       }
>    
>     -       update_entity_load_avg(se, 0);
>             cfs_rq->runnable_load_avg += se->avg.load_avg_contrib;
>     -       update_cfs_rq_blocked_load(cfs_rq);
>     +       /* we force update consideration on load-balancer moves */
>     +       update_cfs_rq_blocked_load(cfs_rq, !wakeup);
>      }
>    
>       --
>     To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>     the body of a message to majordomo@...r.kernel.org
>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>     Please read the FAQ at  http://www.tux.org/lkml/
>
> Regards
> Preeti