linux-kernel - Re: Inconsistent load average on tickless kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1330532667.11248.153.camel@twins>
Date:	Wed, 29 Feb 2012 17:24:27 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Lesław Kopeć <leslaw.kopec@...za-klasa.pl>
Cc:	Aman Gupta <aman@...1.net>, linux-kernel@...r.kernel.org,
	Chase Douglas <chase.douglas@...onical.com>,
	Damien Wyart <damien.wyart@...e.fr>,
	Kyle McMartin <kyle@...hat.com>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Jonathan Nieder <jrnieder@...il.com>
Subject: Re: Inconsistent load average on tickless kernels

On Wed, 2012-02-29 at 13:06 +0100, Peter Zijlstra wrote:
> 
> > Steps to reproduce: run a bunch of CPU bound processes that will not use
> > all available cycles. The biggest difference between expected and
> > measured load is around 30% CPU utilization in my case.
> 
> Hrmm, this suggests we age too hard with nohz code.. in your test case
> is there significant idle time? That is, suppose you run each cpu at 30%
> what is the period of you load? Running 3s out of 10s is significantly
> different from running .3ms out of 1ms.

I can indeed see some weirdness, but not only downwards, I can manage to
get a load of 1 with two 20% burners (0.1 ms period). Still need to try
with bigger periods.

> > Has there been any other patches that correct load calculation? Maybe
> > I'm testing it in a wrong way? I'd appreciate any suggestions. I'd be
> > happy to test new patches. Sadly, I cannot propose any fixes as kernel
> > sources are still a mystery to me.
> 
> Darned load-tracking stuff.. I went over it again but couldn't spot
> anything obviously broken. I suspect the tail magic of
> calc_global_nohz() is busted, just not seeing it atm.
> 
> Will go brew myself a fresh pot of tea and stare more.

The only thing I could find is that on nohz we can confuse the per-rq
sample period, does the below make a difference?

---
 kernel/sched/core.c  |    9 +--------
 kernel/sched/sched.h |    1 -
 2 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d7c4322..370c578 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2372,15 +2372,13 @@ static void calc_load_account_active(struct rq *this_rq)
 {
 	long delta;
 
-	if (time_before(jiffies, this_rq->calc_load_update))
+	if (time_before(jiffies, calc_load_update))
 		return;
 
 	delta  = calc_load_fold_active(this_rq);
 	delta += calc_load_fold_idle();
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
-
-	this_rq->calc_load_update += LOAD_FREQ;
 }
 
 /*
@@ -5329,10 +5327,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_UP_PREPARE:
-		rq->calc_load_update = calc_load_update;
-		break;
-
 	case CPU_ONLINE:
 		/* Update our root-domain */
 		raw_spin_lock_irqsave(&rq->lock, flags);
@@ -6879,7 +6873,6 @@ void __init sched_init(void)
 		raw_spin_lock_init(&rq->lock);
 		rq->nr_running = 0;
 		rq->calc_load_active = 0;
-		rq->calc_load_update = jiffies + LOAD_FREQ;
 		init_cfs_rq(&rq->cfs);
 		init_rt_rq(&rq->rt, rq);
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8a2c768..59b5a33 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -441,7 +441,6 @@ struct rq {
 #endif
 
 	/* calc_load related fields */
-	unsigned long calc_load_update;
 	long calc_load_active;
 
 #ifdef CONFIG_SCHED_HRTICK

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/