lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAKohpomaE3DDrHT8NBYh9TBt_MN6vTDtBFX_R9kEsZuA3ipEmw@mail.gmail.com>
Date:	Thu, 10 Jul 2014 15:47:54 +0530
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:	Frédéric Weisbecker <fweisbec@...il.com>,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
	Lists linaro-kernel <linaro-kernel@...ts.linaro.org>,
	Linaro Networking <linaro-networking@...aro.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Kevin Hilman <khilman@...aro.org>,
	Santosh Shukla <santosh.shukla@...aro.org>,
	Arvind Chauhan <Arvind.Chauhan@....com>
Subject: [Bug] Spurious hrtimer-interrupts

Hi Thomas/Daniel et al,

This isn't about the problem I reported earlier, where you advised
to add ONESHOT_STOPPED mode: https://lkml.org/lkml/2014/5/9/508.
Above problem was about stopping the clock-event device when
its not used anymore.

This ($subject) problem was initially spotted on Ivybrdge V2, 12 core
X86 server by Santosh. And then I reproduced it on Dual core ARM
Exynos (isn't that frequent as it was on x86 though).

Problem: Getting spurious ticks where hrtimer_interrupt() returns
without servicing any hrtimers.

Kernel hack to catch this: http://pastebin.com/bTM7nqDc (Over 3.16-rc3)
X86 boot logs: http://pastebin.com/E6axDnsa (search: hrtimer_interrupt)
/proc/cpuinfo: http://pastebin.com/uQx9TmsA

The last I could debug it to is:

- Clockevent device is programmed for time 'x' seconds (Verified this
  by storing next-event from within lapic_next_event()).
- Tick fires ~300 us before 'x'
- Traversing through the list of hrtimers doesn't result in any pending
hrtimer and we simply return. And so *spurious* interrupt.

- Happens when ticks are active or stopped (search for "tick-stopped"
in logs)

Driver monitored for x86: arch/x86/kernel/apic/apic.c
Similar behavior observed on exynos with arm_arch_timer.c

I couldn't get any deeper into it to see what's going on. From the behavior
It looks lik the calculations we are doing with dev->mult/shift gives
timeout <= next-event, whereas it should be >= ? Not at all sure though.

Reported-by: Santosh Shukla <santosh.shukla@...aro.org>

Note: Even the Hacky patchset that tried to to disable clockevent device
when not used anymore, isn't able to fix it:
https://lkml.org/lkml/2014/5/9/99..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ