linux-kernel - Re: Runtime accounting bug?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LNX.1.00.0807130335400.3302@monolith>
Date:	Sun, 13 Jul 2008 04:18:36 -0400 (EDT)
From:	George Glover <hyperborean@...cast.net>
To:	Andrew Morton <akpm@...ux-foundation.org>
cc:	linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Runtime accounting bug?



On Sun, 13 Jul 2008, Andrew Morton wrote:

> On Sun, 6 Jul 2008 14:00:22 -0400 (EDT) George Glover <hyperborean@...cast.net> wrote:
>
>> Hello,
>>
>> I run two copies (dual processor system) of "mprime" from the GIMPS
>> project. After a while running (weeks?) the cumulative runtime reported by
>> top increments faster than real time, then after a while (unknown how long)
>> the value increments normally again.  Then, maybe something overflows - but
>> runtime accounting stops entirely even though the process is in the run state.
>>
>> (mprime is a cpu-bound low priority process like seti@...e and friends.)
>>
>> I presently have a stuck process and a one that should soon start to
>> increment faster than possible.
>>
>> I have verified that the "stuck" process is indeed running since it continues
>> to generate output.
>>
>> Here is the "stuck" process:
>>
>> cat /proc/4126/stat; sleep 5;  cat /proc/4126/stat
>> 4126 (mprime) R 2984 4126 2984 34819 4126 4202496 16530 0 4 0 2124505930 661087 0 0 39 19 1 0 8442861 21483520 3733 4294967295 134512640 138881564 3220348480 3220345732 135248565 0 0 0 16386 0 0 0 17 1 0 0 0 0 0
>                                                                ^^^^^^^^^^
>> 4126 (mprime) R 2984 4126 2984 34819 4126 4202496 16530 0 4 0 2124505930 661087 0 0 39 19 1 0 8442861 21483520 3733 4294967295 134512640 138881564 3220348480 3220345736 135241038 0 0 0 16386 0 0 0 17 1 0 0 0 0 0
>                                                                ^^^^^^^^^^
>
> yes?  The 14th column, do_task_stat()'s cputime_to_clock_t(utime)?

Yup, that is what I discovered with a brief tour of the source I could 
find.  That particular task started to count up again a few days later, 
however at an accelerated rate 1.4-1.6x real time measuring that 14th 
field.  I am wondering if it is related to a timer problem that has been
reported recently:

 	http://lkml.org/lkml/2008/7/9/330

I can't say I've noticed my system clock advancing any faster however, 
maybe it's affecting only of the cpus and my clock app runs on the other.

I've rebooted since I wrote the message, so it may be a while before repeat 
behavior.

> My suspicion wold be that some of the arithmetic in here:
>
> static cputime_t task_utime(struct task_struct *p)
> {
> 	clock_t utime = cputime_to_clock_t(p->utime),
> 		total = utime + cputime_to_clock_t(p->stime);
> 	u64 temp;
>
> 	/*
> 	 * Use CFS's precise accounting:
> 	 */
> 	temp = (u64)nsec_to_clock_t(p->se.sum_exec_runtime);
>
> 	if (total) {
> 		temp *= utime;
> 		do_div(temp, total);
> 	}
> 	utime = (clock_t)temp;
>
> 	p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
> 	return p->prev_utime;
> }
>
> is suffering overflows at very high values of p->utime.
>
> It's a 32bit machine, yes?

Yes it is.

> It would useful to know what your values of TICK_NSEC, NSEC_PER_SEC and
> USER_HZ are.

    TICK_NSEC is 999848
NSEC_PER_SEC is 1000000000
      USER_HZ is 100

Since the process did start ticking again I wonder if is not an overflow 
and related to that other post I linked.  I don't know enough about the 
various areas to judge though.

Let me know what else I can do if anything,

George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/