lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1291833054.5015.1133.camel@gandalf.stny.rr.com>
Date:	Wed, 08 Dec 2010 13:30:54 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Roman Zippel <zippel@...ux-m68k.org>,
	John Stultz <johnstul@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: [RFC][PATCH] timekeeping: Keep xtime_nsec remainder separate from
 ntp_error

While doing my end of year unlikely() cleanup, running the annotate
branch profiler, I came across this:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
  122588 65653641  99 timekeeping_adjust             timekeeping.c        664
  167493 14584927  98 timekeeping_adjust             timekeeping.c        658

This shows that the following likely()'s are wrong most of the time:

	if (error > interval) {
		error >>= 2;
		if (likely(error <= interval))
			adj = 1;
		else
			adj = timekeeping_bigadjust(error, &interval, &offset);
	} else if (error < -interval) {
		error >>= 2;
		if (likely(error >= -interval)) {

Talking about this with John Stultz, we both agreed that the annotations
should be correct and is not the issue, but something else is going
wrong.

Adding in trace_printks(), I saw that the adj values that were added to
the "mult" multiplier were sometimes quite large. The time intervals
never got down into a small error, but instead was making large
oscillations, both positive and negative to where it should be.

John noticed that if he removed the commit:

commit 5cd1c9c5cf30d4b33df3d3f74d8142f278d536b7
timekeeping: fix rounding problem during clock update

that the problem would go away and we would get back into a tight
oscillation that would stay within the fast path (and the likely()'s
were again likely).

What the above commit did was to fix a bug that caused time to go
backward a nanosec due to the truncating of the xtime_nsec shifted into
the xtime.tv_nsec.  The fix for that bug (and what that commit did) was
to always round up one. It added +1 to the xtime.tv_nsec after it did
the conversion, and then took the difference between this shifted and
the xtime_nsec and stored that into the ntp_error.

The ntp_error is used to control the frequency, and this constant adding
of the shift remainder would cause the large oscillation.

This patch instead adds another field to the timekeeping structure that
stores the remainder separately. On re-entry into update_wall_time(),
the remainder is added back onto the xtime_nsec after it is set to the
xtime.tv_nsec and restoring its original value.

This handles the rounding problem that the original commit addressed but
does not cause the large oscillation that it caused.

The new results of the branch annotation profiler is now:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
  736971      129   0 timekeeping_adjust             timekeeping.c        685
  736943      115   0 timekeeping_adjust             timekeeping.c        676


Cc: Roman Zippel <zippel@...ux-m68k.org>
Cc: John Stultz <johnstul@...ibm.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Signed-off-by: Steven Rostedt <rostedt@...dmis.org>

Index: linux-compile.git/kernel/time/timekeeping.c
===================================================================
--- linux-compile.git.orig/kernel/time/timekeeping.c
+++ linux-compile.git/kernel/time/timekeeping.c
@@ -37,6 +37,10 @@ struct timekeeper {
 
 	/* Clock shifted nano seconds remainder not stored in xtime.tv_nsec. */
 	u64	xtime_nsec;
+
+	/* remainder in xtime subtraction */
+	u64	xtime_nsec_rem;
+
 	/* Difference between accumulated time and NTP time in ntp
 	 * shifted nano seconds. */
 	s64	ntp_error;
@@ -84,6 +88,7 @@ static void timekeeper_setup_internals(s
 		((u64) interval * clock->mult) >> clock->shift;
 
 	timekeeper.xtime_nsec = 0;
+	timekeeper.xtime_nsec_rem = 0;
 	timekeeper.shift = clock->shift;
 
 	timekeeper.ntp_error = 0;
@@ -751,6 +756,12 @@ void update_wall_time(void)
 	timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
 
 	/*
+	 * Add back the remainder that was left over when adding +1 to
+	 * xtime.tv_nsec;
+	 */
+	timekeeper.xtime_nsec += timekeeper.xtime_nsec_rem;
+
+	/*
 	 * With NO_HZ we may have to accumulate many cycle_intervals
 	 * (think "ticks") worth of time at once. To do this efficiently,
 	 * we calculate the largest doubling multiple of cycle_intervals
@@ -797,12 +808,11 @@ void update_wall_time(void)
 
 	/*
 	 * Store full nanoseconds into xtime after rounding it up and
-	 * add the remainder to the error difference.
+	 * store the remainder to update xtime_nsec on the next iteration.
 	 */
 	xtime.tv_nsec =	((s64) timekeeper.xtime_nsec >> timekeeper.shift) + 1;
 	timekeeper.xtime_nsec -= (s64) xtime.tv_nsec << timekeeper.shift;
-	timekeeper.ntp_error +=	timekeeper.xtime_nsec <<
-				timekeeper.ntp_error_shift;
+	timekeeper.xtime_nsec_rem = timekeeper.xtime_nsec;
 
 	/*
 	 * Finally, make sure that after the rounding


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ