linux-kernel - [PATCH RFC] time: drop do_sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140812142539.01851e52@annuminas.surriel.com>
Date:	Tue, 12 Aug 2014 14:25:39 -0400
From:	Rik van Riel <riel@...hat.com>
To:	linux-kernel@...r.kernel.org
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	Frank Mayhar <fmayhar@...gle.com>,
	Frederic Weisbecker <fweisbec@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Sanjay Rao <srao@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>
Subject: [PATCH RFC] time: drop do_sys_times spinlock

Back in 2009, Spencer Candland pointed out there is a race with
do_sys_times, where multiple threads calling do_sys_times can
sometimes get decreasing results.

https://lkml.org/lkml/2009/11/3/522

As a result of that discussion, some of the code in do_sys_times
was moved under a spinlock.

However, that does not seem to actually make the race go away on
larger systems. One obvious remaining race is that after one thread
is about to return from do_sys_times, it is preempted by another
thread, which also runs do_sys_times, and stores a larger value in
the shared variable than what the first thread got.

This race is on the kernel/userspace boundary, and not fixable
with spinlocks.

Removing the spinlock from do_sys_times does not seem to result
in an increase in the number of times a decreasing utime is
observed when running the test case. In fact, on the 80 CPU test
system that I tried, I saw a small decrease, from an average
14.8 to 6.5 instances of backwards utime running the test case.

Back in 2009, in changeset 2b5fe6de5 Oleg Nesterov already found
that it should be safe to remove the spinlock.  I believe this is
true, because it appears that nobody changes another task's ->sighand
pointer, except at fork time and exit time, during which the task
cannot be in do_sys_times.

This is subtle enough to warrant documenting.

The increased scalability of removing the spinlock should help
things like databases and middleware that measure the resource
use of every query processed.

Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Cc: Frank Mayhar <fmayhar@...gle.com>
Cc: Frederic Weisbecker <fweisbec@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Sanjay Rao <srao@...hat.com>
Cc: Larry Woodman <lwoodman@...hat.com>
Signed-off-by: Rik van Riel <riel@...hat.com>
---
 kernel/sys.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 66a751e..cb81ce4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -862,11 +862,15 @@ void do_sys_times(struct tms *tms)
 {
 	cputime_t tgutime, tgstime, cutime, cstime;

-	spin_lock_irq(&current->sighand->siglock);
+	/*
+	 * sys_times gets away with not locking &current->sighand->siglock
+	 * because most of the time only the current process gets to change
+	 * its own sighand pointer. The exception is exit, which changes
+	 * the sighand pointer of an exiting process.
+	 */
 	thread_group_cputime_adjusted(current, &tgutime, &tgstime);
 	cutime = current->signal->cutime;
 	cstime = current->signal->cstime;
-	spin_unlock_irq(&current->sighand->siglock);
 	tms->tms_utime = cputime_to_clock_t(tgutime);
 	tms->tms_stime = cputime_to_clock_t(tgstime);
 	tms->tms_cutime = cputime_to_clock_t(cutime);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/