linux-kernel - Re: [PATCH v5 5/7] sched: compute runnable load avg in cpu_load and cpu_avg_load_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPM31RLJe19FH+RYipwm4kwtzmFHoS2xGTx4iZ9+qwDJZ_pf3A@mail.gmail.com>
Date:	Mon, 6 May 2013 01:46:19 -0700
From:	Paul Turner <pjt@...gle.com>
To:	Alex Shi <alex.shi@...el.com>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Borislav Petkov <bp@...en8.de>,
	Namhyung Kim <namhyung@...nel.org>,
	Mike Galbraith <efault@....de>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Michael Wang <wangyun@...ux.vnet.ibm.com>
Subject: Re: [PATCH v5 5/7] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task

On Sun, May 5, 2013 at 6:45 PM, Alex Shi <alex.shi@...el.com> wrote:
> They are the base values in load balance, update them with rq runnable
> load average, then the load balance will consider runnable load avg
> naturally.
>
> Signed-off-by: Alex Shi <alex.shi@...el.com>
> ---
>  kernel/sched/core.c | 4 ++--
>  kernel/sched/fair.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 33bcebf..2f51636 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2536,7 +2536,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>  void update_idle_cpu_load(struct rq *this_rq)
>  {
>         unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> -       unsigned long load = this_rq->load.weight;
> +       unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;

We should be minimizing:
  Variance[ for all i ]{ cfs_rq[i]->runnable_load_avg +
cfs_rq[i]->blocked_load_avg }

blocked_load_avg is the expected "to wake" contribution from tasks
already assigned to this rq.

e.g. this could be:
  load = this_rq->cfs.runnable_load_avg + this_rq->cfs.blocked_load_avg;

Although, in general I have a major concern with the current implementation:

The entire reason for stability with the bottom up averages is that
when load migrates between cpus we are able to migrate it between the
tracked sums.

Stuffing observed averages of these into the load_idxs loses that
mobility; we will have to stall (as we do today for idx > 0) before we
can recognize that a cpu's load has truly left it; this is a very
similar problem to the need to stably track this for group shares
computation.

To that end, I would rather see the load_idx disappear completely:
 (a) We can calculate the imbalance purely from delta (runnable_avg +
blocked_avg)
 (b) It eliminates a bad tunable.

>         unsigned long pending_updates;
>
>         /*
> @@ -2586,7 +2586,7 @@ static void update_cpu_load_active(struct rq *this_rq)
>          * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
>          */
>         this_rq->last_load_update_tick = jiffies;
> -       __update_cpu_load(this_rq, this_rq->load.weight, 1);
> +       __update_cpu_load(this_rq, this_rq->cfs.runnable_load_avg, 1);
>
>         calc_load_account_active(this_rq);
>  }
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2881d42..0bf88e8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2900,7 +2900,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  /* Used instead of source_load when we know the type == 0 */
>  static unsigned long weighted_cpuload(const int cpu)
>  {
> -       return cpu_rq(cpu)->load.weight;
> +       return (unsigned long)cpu_rq(cpu)->cfs.runnable_load_avg;

Isn't this going to truncate on the 32-bit case?

>  }
>
>  /*
> @@ -2947,7 +2947,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
>         unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
>
>         if (nr_running)
> -               return rq->load.weight / nr_running;
> +               return (unsigned long)rq->cfs.runnable_load_avg / nr_running;
>
>         return 0;
>  }
> --
> 1.7.12
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/