linux-kernel - Re: [PATCH] sched: Folding nohz load accounting more accurate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4FDF3E30.2090307@gmail.com>
Date:	Mon, 18 Jun 2012 22:41:52 +0800
From:	Charles Wang <muming.wq@...il.com>
To:	Doug Smythies <dsmythies@...us.net>
CC:	'Peter Zijlstra' <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, 'Ingo Molnar' <mingo@...hat.com>,
	'Tao Ma' <tm@....ma>,
	'含黛' <handai.szj@...bao.com>
Subject: Re: [PATCH] sched: Folding nohz load accounting more accurate

Peter's patch works the well.  Now I ported the second patch
to fix high load problem based on Peter's. It works fine on my testing
environment.  Doug, please try this. Thanks.


In our mind per-cpu sampling for cpu idle and non-idle is equal. But
actually may not. For non-idle cpu sampling, it's right the load when
sampling. But for idle, cause of nohz, the sampling will be delayed to
nohz exit(less than 1 tick after nohz exit). Nohz exit is always caused
by processes woken up--non-idle model. It's not fair here. Idle
sampling will be turned to non-idle sampling. And cause loadavg being
higher than normal.

     time-expected-sampling
                   |    time-do-sampling
                   |         |
                   V         V
-|-------------------------|--
start_nohz              stop_nohz

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4101a0e..180e612 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2228,6 +2228,7 @@ void calc_load_account_idle(struct rq *this_rq)
        int idx;

        delta = calc_load_fold_active(this_rq);
+       this_rq->last_idle_enter = jiffies;

        if (delta) {
                idx = calc_load_write_idx();
                atomic_long_add(delta, &calc_load_idle[idx]);
@@ -2431,15 +2432,27 @@ void calc_global_load(void)
 static void calc_load_account_active(struct rq *this_rq)
 {
        long delta;
+       unsigned long delta_time;
+       long last_idle_time_elapse;

        if (time_before(jiffies, this_rq->calc_load_update))
                return;

+       last_idle_time_elapse = this_rq->last_idle_enter - calc_load_update;
+       delta_time = jiffies - this_rq->calc_load_update;
+
+       if (last_idle_time_elapse > 0)
+               goto out;
+
+       if ((last_idle_time_elapse > -1) && (delta_time >= 1))
+               goto out;
+
        delta  = calc_load_fold_active(this_rq);
        delta += calc_load_fold_idle();
        if (delta)
                atomic_long_add(delta, &calc_load_tasks);

+out:
        this_rq->calc_load_update += LOAD_FREQ;
 }

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4134d37..a356588 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -438,6 +438,7 @@ struct rq {

        /* calc_load related fields */
        unsigned long calc_load_update;
+       unsigned long last_idle_enter;
        long calc_load_active;

 #ifdef CONFIG_SCHED_HRTICK
-- 
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/