lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090526231313.GB27218@linux-sh.org>
Date:	Wed, 27 May 2009 08:13:13 +0900
From:	Paul Mundt <lethal@...ux-sh.org>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Linus Walleij <linus.ml.walleij@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Victor <linux@...im.org.za>,
	Haavard Skinnemoen <hskinnemoen@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-sh@...r.kernel.org,
	linux-arm-kernel@...ts.arm.linux.org.uk,
	John Stultz <johnstul@...ux.vnet.ibm.com>
Subject: Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().

On Wed, May 27, 2009 at 08:08:55AM +0900, Paul Mundt wrote:
> On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > The definition of "rating" from the kerneldoc does not
> > > > seem to imply that, it's a subjective measure AFAICT.
> > 
> >   Right, there is no rating threshold defined, which allows to deduce
> >   that. The TSC on x86 which might be unreliable, but usable as
> >   sched_clock has an initial rating of 300 which can be changed later
> >   on to 0 when the TSC is unusable as a time of day source. In that
> >   case clock is replaced by HPET which has a rating > 100 but is
> >   definitely not a good choice for sched_clock
> > 
> > > > Else you might want an additional criteria, like
> > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > (however you do that the best way)
> > > > so you don't pick something
> > > > that isn't substantially faster than the jiffy counter atleast?
> > 
> >   What we can do is add another flag to the clocksource e.g.
> >   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> >   rating.
> > 
> Ok, so based on this and John's locking concerns, how about something
> like this? It doesn't handle the wrapping cases, but I wonder if we
> really want to add that amount of logic to sched_clock() in the first
> place. Clocksources that wrap frequently could either leave the flag
> unset, or do something similar to the TSC code where the cyc2ns shift is
> used. If this is something we want to handle generically, then I'll have
> a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> 
Lets try that again..

---

 include/linux/clocksource.h |    2 ++
 kernel/sched_clock.c        |   22 ++++++++++++++++++++++
 kernel/time/clocksource.c   |    2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..cfd873e 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -203,6 +203,7 @@ struct clocksource {
 };
 
 extern struct clocksource *clock;	/* current clocksource */
+extern spinlock_t clocksource_lock;
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..c7027cd 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,6 +39,27 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
+	/*
+	 * Use the current clocksource when it becomes available later in
+	 * the boot process. As this needs to be fast, we only make a
+	 * single pass at grabbing the spinlock. If the clock is changing
+	 * out from underneath us, fall back on jiffies and try it again
+	 * the next time around.
+	 */
+	if (clock && _raw_spin_trylock(&clocksource_lock)) {
+		/*
+		 * Only use clocksources suitable for sched_clock()
+		 */
+		if (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK) {
+			cycle_t now = cyc2ns(clock, clocksource_read(clock));
+			_raw_spin_unlock(&clocksource_lock);
+			return now;
+		}
+
+		_raw_spin_unlock(&clocksource_lock);
+	}
+
+	/* If all else fails, fall back on jiffies */
 	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
 					* (NSEC_PER_SEC / HZ);
 }
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 80189f6..437a6cf 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -127,7 +127,7 @@ static struct clocksource *curr_clocksource = &clocksource_jiffies;
 static struct clocksource *next_clocksource;
 static struct clocksource *clocksource_override;
 static LIST_HEAD(clocksource_list);
-static DEFINE_SPINLOCK(clocksource_lock);
+DEFINE_SPINLOCK(clocksource_lock);
 static char override_name[32];
 static int finished_booting;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ