lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Oct 2007 12:34:35 +0200
From:	Christian Borntraeger <borntraeger@...ibm.com>
To:	balbir@...ux.vnet.ibm.com
Cc:	Chuck Ebbert <cebbert@...hat.com>, Frans Pop <elendil@...net.nl>,
	Greg KH <greg@...ah.com>, stable@...nel.org,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Am Dienstag, 16. Oktober 2007 schrieb Balbir Singh:
> I am trying to think out loud as to what the root cause of the problem
> might be. In one of the discussion threads, I saw utime going backwards,
> which seemed very odd, I suspect that those are rounding errors.
> 
> I don't understand your explanation below
> 
> Initially utime = 9, stime = 0, sum_exec_runtime = S1
> 
> Later
> 
> utime = 9, stime = 1, sum_exec_runtime = S2
> 
> We can be sure that S >= (utime + stime)

I think here is the problem. How can we be sure? We cant. utime and stime
are sampled, so they can be largely off in any direction,if the program
sleeps often and manages to synchronize itself to the timer tick. Lets say
a program only does a simple system call and then sleeps. So sum_exec_runtime
is increased by lets say 1000 cycles on a 1Ghz box which means 1000ns. If now 
the timer tick happens exactly at this moment, stime is increased by 1 tick
= 1000000ns. 

Maybe there is some magic in the code which I did not see, but obviously
the problem exists and looking at Frans data (stime+utime) are not decreasing,
but stime isnt and utime is. If you look at Frans data you see:
Oct 16 11:54:48 8 10
Oct 16 11:54:49 6 12  <-- utime
Oct 16 11:54:50 6 12
Oct 16 11:54:51 6 12
Oct 16 11:54:52 8 10  <-- stime
Oct 16 11:54:53 8 10
Oct 16 11:54:54 8 10
Oct 16 11:54:55 8 12
Oct 16 11:54:56 8 12

(stime+utime) is constant. That means that S2-S1 is obviously smaller than
one tick (See the calculation in task_stime). I am quite sure it is caused
by changes in the sampled values p->utime and p->stime.

> 
> If S2 = S1 + delta, then as per our calculation
> 
> Initially
> 
> utime_proc = (utime * (S1))/(utime + stime)
>            = nsec_to_clock_t(9 * S1 / 9)
> 
> later
> 
> utime_proc = nsec_to_clock_t(9 * S2/10)
> 
> Given that S >= (utime + stime), we should be fine.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ