lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160710122345.13061-1-nicstange@gmail.com>
Date:	Sun, 10 Jul 2016 14:23:41 +0200
From:	Nicolai Stange <nicstange@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
	x86@...nel.org, John Stultz <john.stultz@...aro.org>,
	Borislav Petkov <bp@...e.de>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	"Christopher S. Hall" <christopher.s.hall@...el.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	linux-kernel@...r.kernel.org, Nicolai Stange <nicstange@...il.com>
Subject: [PATCH 0/4] avoid double timer interrupt with nohz and Intel TSC

With a single task running on a NOHZ CPU on an Intel Haswell, I recognized
that I did not only get the one expected local_timer APIC interrupt, but
two per second at minimum.

Further tracing showed that the first one preceedes the programmed deadline
by up to ~50us and hence, it did nothing except for reprogramming the TSC
deadline clockevent device to trigger shortly thereafter again.

FYI, the trace looks like this:

  <...>-2938  [007] d.h.   420.753164: local_timer_entry: vector=239
  <...>-2938  [007] d.h.   420.753164: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-2938  [007] d.h.   420.753184: local_timer_entry: vector=239
  <...>-2938  [007] d.h.   420.753184: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-2938  [007] d.h.   420.753195: tick_sched_timer <-__hrtimer_run_queues
  <...>-2938  [007] d.h.   421.752170: local_timer_entry: vector=239
  <...>-2938  [007] d.h.   421.752171: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-2938  [007] d.h.   421.752202: local_timer_entry: vector=239
  <...>-2938  [007] d.h.   421.752202: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-2938  [007] d.h.   421.752202: tick_sched_timer <-__hrtimer_run_queues

It turns out that this too early programmed TSC deadline is caused by
inaccuracies in some frequency calculations which become significant if
the timer periods become large as it is the case for nohz with one task
(delta = 10^9ns).

The first three patches address inaccuracies entering the TSC deadline
clockevent devices' frequency.

The fourth patch is the most important one as it addresses the error
of largest relative magnitude. It is caused by the assumption in the
clockevents core that the ratio of the monotonic clock's frequency to that
of the clockevent device's is a constant. Since the monotonic clock's
frequency gets dynamically adjusted in order to compensate for NTP errors,
this is not true.

With this patchset applied, the trace looks like this:

  <...>-23609 [007] d.h.  1811.586658: local_timer_entry: vector=239
  <...>-23609 [007] d.h.  1811.586680: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-23609 [007] d.h.  1811.586680: tick_sched_timer <-__hrtimer_run_queues
  <...>-23609 [007] d.h.  1812.585659: local_timer_entry: vector=239
  <...>-23609 [007] d.h.  1812.585666: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-23609 [007] d.h.  1812.585666: tick_sched_timer <-__hrtimer_run_queues
  <...>-23609 [007] d.h.  1813.584661: local_timer_entry: vector=239
  <...>-23609 [007] d.h.  1813.584668: __hrtimer_run_queues <-hrtimer_interrupt
  <...>-23609 [007] d.h.  1813.584668: tick_sched_timer <-__hrtimer_run_queues

Please note that the first three TSC-patches might not be necessary to
get this result. In fact, [3/4] ("arch, x86, tsc: inform TSC deadline
clockevent device about") is somewhat counterproductive in the sense that
on my system, it usually corrects the TSC deadline device's frequency
towards lower values and thus, facilitates the too-early interrupt
behaviour initially described. However, I decided to send them along with
the fourth patch because
 - I tested the fourth patch in this setting
 - I believe that a greater accurracy of the TSC deadline device is
   worthwhile on its own

Applicable to linux-next-20160708. The individual patches don't depend on
each other.

Nicolai Stange (4):
  arch, x86, tsc deadline clockevent dev: reduce frequency roundoff
    error
  arch, x86, tsc deadline clockevent dev: reduce TSC_DIVISOR to 2
  arch, x86, tsc: inform TSC deadline clockevent device about
    recalibration
  kernel/time/clockevents: compensate for monotonic clock's dynamic
    frequency

 arch/x86/include/asm/apic.h |  1 +
 arch/x86/kernel/apic/apic.c | 29 ++++++++++++++++++++++++--
 arch/x86/kernel/tsc.c       |  4 ++++
 kernel/time/clockevents.c   |  1 +
 kernel/time/timekeeping.c   | 50 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/time/timekeeping.h   |  1 +
 6 files changed, 84 insertions(+), 2 deletions(-)

-- 
2.9.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ