linux-kernel - Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200710291305.52992.elendil@planet.nl>
Date:	Mon, 29 Oct 2007 13:05:51 +0100
From:	Frans Pop <elendil@...net.nl>
To:	balbir@...ux.vnet.ibm.com
Cc:	Christian Borntraeger <borntraeger@...ibm.com>,
	Chuck Ebbert <cebbert@...hat.com>, Greg KH <greg@...ah.com>,
	stable@...nel.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Hello Balbir,

On Tuesday 16 October 2007, Balbir Singh wrote:
> Christian Borntraeger wrote:
> > Am Dienstag, 16. Oktober 2007 schrieb Balbir Singh:
> >> I am trying to think out loud as to what the root cause of the problem
> >> might be. In one of the discussion threads, I saw utime going
> >> backwards, which seemed very odd, I suspect that those are rounding
> >> errors.
> >>
> >> I don't understand your explanation below
> >>
> >> Initially utime = 9, stime = 0, sum_exec_runtime = S1
> >>
> >> Later
> >>
> >> utime = 9, stime = 1, sum_exec_runtime = S2
> >>
> >> We can be sure that S >= (utime + stime)
> >
> > I think here is the problem. How can we be sure? We cant. utime and
> > stime are sampled, so they can be largely off in any direction,if the
> > program sleeps often and manages to synchronize itself to the timer
> > tick. Lets say a program only does a simple system call and then
> > sleeps. So sum_exec_runtime is increased by lets say 1000 cycles on a
> > 1Ghz box which means 1000ns. If now the timer tick happens exactly at
> > this moment, stime is increased by 1 tick = 1000000ns.
>
> Yes, I thought of that just after I sent out my email. In the case that
> you mention, the utime and stime accounting is incorrect anyway :-)
> I think we need to find a better solution. I was going to propose that
> we round correctly in (the divisions in)
>
> 1. task_utime()
> 2. clock_t_to_cputime()
>
> I suspect we'll need to round task_utime() to p->utime if the value of
> task_utime() < p->utime and the same thing for task_stime(). I've tried
> reproducing the problem on my UML setup without any success. Let me
> try and grab an x86 box.

Any progress on this issue? I noticed that it's still there in current git.

If a better implementation is not expected any time soon, how about an ACK 
on the reversion patch Christian proposed in
http://lkml.org/lkml/2007/10/16/76
so we can at least get rid of the regression?

Thanks,
Frans Pop
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/