lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 29 Feb 2012 13:06:35 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Lesław Kopeć <leslaw.kopec@...za-klasa.pl>
Cc:	Aman Gupta <aman@...1.net>, linux-kernel@...r.kernel.org,
	Chase Douglas <chase.douglas@...onical.com>,
	Damien Wyart <damien.wyart@...e.fr>,
	Kyle McMartin <kyle@...hat.com>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Jonathan Nieder <jrnieder@...il.com>
Subject: Re: Inconsistent load average on tickless kernels

On Thu, 2012-02-23 at 16:46 +0100, Lesław Kopeć wrote:

> Each kernel was compiled with CONFIG_NO_HZ enabled (no-hz variant) and
> disabled (hz variant). Here's a snapshot of load 15 on each kernel:

> 				no-hz	hz
> 2.6.32.55-*			0.59	0.57
> 2.6.32.55-*-74f5187ac8		3.56	11.79
> 2.6.32.55-*-0f004f5a69		0.61	11.76
> 2.6.37-rc5-*-0f004f5a69		0.67	11.65
> 2.6.37-rc5-*-pre-0f004f5a69	3.97	12.05

Missing here is a kernel build with CONFIG_NO_HZ but booted with
nohz=off; this would be an interesting data point because it includes
all the funny code but still ticks are the right frequency.

> My observations are:
> 
> 1. On tickless kernels load is very low where no or both patches
> (74f5187ac8 and 0f004f5a69) are applied.
> 
> 2. Kernels that have only patch 74f5187ac8 applied have the smallest
> difference between hz and no-hz variants. Still no-hz kernels are
> returning values lower than their hz siblings.
> 
> 3. Non-tickless kernels seem to be reporting correct load values.
> Overall trend and values are matching CPU utilization. Only exception is
> 2.6.32.55-hz which reports the same values as 2.6.32.55-no-hz.
> 
> 4. If x processes are using all available cycles load is correctly
> incremented by x. This behavior is consistent on all kernels.

Yay! at least we get something right.. Also, I think we actually will go
down to load 0 if the machine is idle, we used to get that wrong for
nohz too.

> Steps to reproduce: run a bunch of CPU bound processes that will not use
> all available cycles. The biggest difference between expected and
> measured load is around 30% CPU utilization in my case.

Hrmm, this suggests we age too hard with nohz code.. in your test case
is there significant idle time? That is, suppose you run each cpu at 30%
what is the period of you load? Running 3s out of 10s is significantly
different from running .3ms out of 1ms.

> Has there been any other patches that correct load calculation? Maybe
> I'm testing it in a wrong way? I'd appreciate any suggestions. I'd be
> happy to test new patches. Sadly, I cannot propose any fixes as kernel
> sources are still a mystery to me.

Darned load-tracking stuff.. I went over it again but couldn't spot
anything obviously broken. I suspect the tail magic of
calc_global_nohz() is busted, just not seeing it atm.

Will go brew myself a fresh pot of tea and stare more.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ