linux-kernel - Re: process time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1109011148060.2723@ionos>
Date:	Thu, 1 Sep 2011 11:56:42 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	David Miller <davem@...emloft.net>
cc:	peterz@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: process time < thread time?

Dave,

On Wed, 31 Aug 2011, David Miller wrote:
> If someone who understands our thread/process time implementation can
> look into this, I'd appreciate it.
> 
> Attached below is a watered-down version of rt/tst-cpuclock2.c from
> GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
> similar.
> 
> Run it several times, and you will see cases where the main thread
> will measure a process clock difference before and after the nanosleep
> which is smaller than the cpu-burner thread's individual thread clock
> difference.  This doesn't make any sense since the cpu-burner thread
> is part of the top-level process's thread group.
> 
> I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
> 64-bit binaries).
> 
> For example:
> 
> [davem@...icha build-x86_64-linux]$ ./test
> process: before(0.001221967) after(0.498624371) diff(497402404)
> thread:  before(0.000081692) after(0.498316431) diff(498234739)
> self:    before(0.001223521) after(0.001240219) diff(16698)
> [davem@...icha build-x86_64-linux]$ 
> 
> The diff of 'process' should always be >= the diff of 'thread'.
> 
> I make sure to wrap the 'thread' clock measurements the most tightly
> around the nanosleep() call, and that the 'process' clock measurements
> are the outer-most ones.
> 
> I suspect this might be some kind of artifact of how the partial
> runqueue ->clock and ->clock_task updates work?  Maybe some weird
> interaction with ->skip_clock_update?
> 
> Or is this some known issue?

That's an SMP artifact. If you run "taskset 01 ./test" the result is
always correct.

The reason why this shows deviations on SMP is how the thread times
are accumulated in thread_group_cputime(). We sum
t->se.sum_exec_runtime of all threads. So if the hog thread is
currently running on the other core (which is likely) then the runtime
field of that thread is not up to date.

The untested patch below should cure this.

Thanks,

	tglx

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 58f405b..42378cb 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -250,7 +250,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
-		times->sum_exec_runtime += t->se.sum_exec_runtime;
+		times->sum_exec_runtime += task_sched_runtime(t);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/