lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 1 Sep 2011 11:56:42 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	David Miller <davem@...emloft.net>
cc:	peterz@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: process time < thread time?

Dave,

On Wed, 31 Aug 2011, David Miller wrote:
> If someone who understands our thread/process time implementation can
> look into this, I'd appreciate it.
> 
> Attached below is a watered-down version of rt/tst-cpuclock2.c from
> GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
> similar.
> 
> Run it several times, and you will see cases where the main thread
> will measure a process clock difference before and after the nanosleep
> which is smaller than the cpu-burner thread's individual thread clock
> difference.  This doesn't make any sense since the cpu-burner thread
> is part of the top-level process's thread group.
> 
> I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
> 64-bit binaries).
> 
> For example:
> 
> [davem@...icha build-x86_64-linux]$ ./test
> process: before(0.001221967) after(0.498624371) diff(497402404)
> thread:  before(0.000081692) after(0.498316431) diff(498234739)
> self:    before(0.001223521) after(0.001240219) diff(16698)
> [davem@...icha build-x86_64-linux]$ 
> 
> The diff of 'process' should always be >= the diff of 'thread'.
> 
> I make sure to wrap the 'thread' clock measurements the most tightly
> around the nanosleep() call, and that the 'process' clock measurements
> are the outer-most ones.
> 
> I suspect this might be some kind of artifact of how the partial
> runqueue ->clock and ->clock_task updates work?  Maybe some weird
> interaction with ->skip_clock_update?
> 
> Or is this some known issue?

That's an SMP artifact. If you run "taskset 01 ./test" the result is
always correct.

The reason why this shows deviations on SMP is how the thread times
are accumulated in thread_group_cputime(). We sum
t->se.sum_exec_runtime of all threads. So if the hog thread is
currently running on the other core (which is likely) then the runtime
field of that thread is not up to date.

The untested patch below should cure this.

Thanks,

	tglx

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 58f405b..42378cb 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -250,7 +250,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
-		times->sum_exec_runtime += t->se.sum_exec_runtime;
+		times->sum_exec_runtime += task_sched_runtime(t);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ