lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1306972711.11492.23.camel@work-vm>
Date:	Wed, 01 Jun 2011 16:58:31 -0700
From:	john stultz <johnstul@...ibm.com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: /proc/stat btime accuracy problem

On Wed, 2011-06-01 at 17:35 -0600, Bjorn Helgaas wrote:
> On Wed, Jun 1, 2011 at 4:35 PM, john stultz <johnstul@...ibm.com> wrote:
> > On Wed, 2011-06-01 at 14:50 -0600, Bjorn Helgaas wrote:
> >> timekeeping_init() basically does the following:
> >>
> >>     xtime = RTC
> >>     if (arch implements read_boot_clock())
> >>         wall_to_monotonic = -read_boot_clock()
> >>     else
> >>       wall_to_monotonic = -xtime
> >>
> >> So wall_to_monotonic records some approximation of the system boot
> >> time, which is then used to derive the "btime" reported in /proc/stat.
> >>
> >> The problem I'm seeing is that xtime is updated on timer ticks, so
> >> uninterruptible code, like kernel serial printk, makes us miss ticks,
> >> so xtime falls behind the RTC.
> >
> > Huh. So this sort of issue was common back when we had tick-based
> > timekeeping (in combination with troubled hardware), but with the
> > current clocksource based timekeeping, occasional lost ticks shouldn't
> > really effect time.
> 
> Makes sense.  Your presentation here was a great help:
>   http://sr71.net/~jstultz/tod/ols-presentation-final.pdf
> 
> > Can you explain a bit more about what kind of hardware this is happening
> > on, and what clocksource is being used?
> 
> Sure.  This is an x86 box.  Normally we're using the TSC clocksource,
> and I don't think the issue happens after that.  I guess my
> experimentation so far has been with uninterruptible time before we
> register *any* clocksource (or at least before I see any "Switching to
> clocksource" messages).

Huh. 

So yea, if we are very early at boot, we're likely using the jiffies
clocksource, which is basically a software-based tick counter, which
would be prone to lost-ticks issues if irqs were disabled for too long.

Do you know if this is this a relatively new issue?

My first instinct is "don't do that!" to whatever driver is disabling
irqs for so long. Do you know what's actually causing these long irq off
periods?

I assume you're noticing this offset by seeing that CLOCK_REALTIME is
off from the RTC right after boot? How severe is this? The RTC read is
only second granular, so there's a fair amount of error (~1 second)
possible right at boot, so this then must be many seconds worth of lost
ticks to be noticeable, right?

thanks
-john





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ