[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAKohpomaE3DDrHT8NBYh9TBt_MN6vTDtBFX_R9kEsZuA3ipEmw@mail.gmail.com>
Date: Thu, 10 Jul 2014 15:47:54 +0530
From: Viresh Kumar <viresh.kumar@...aro.org>
To: Thomas Gleixner <tglx@...utronix.de>,
Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: Frédéric Weisbecker <fweisbec@...il.com>,
Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
Lists linaro-kernel <linaro-kernel@...ts.linaro.org>,
Linaro Networking <linaro-networking@...aro.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Kevin Hilman <khilman@...aro.org>,
Santosh Shukla <santosh.shukla@...aro.org>,
Arvind Chauhan <Arvind.Chauhan@....com>
Subject: [Bug] Spurious hrtimer-interrupts
Hi Thomas/Daniel et al,
This isn't about the problem I reported earlier, where you advised
to add ONESHOT_STOPPED mode: https://lkml.org/lkml/2014/5/9/508.
Above problem was about stopping the clock-event device when
its not used anymore.
This ($subject) problem was initially spotted on Ivybrdge V2, 12 core
X86 server by Santosh. And then I reproduced it on Dual core ARM
Exynos (isn't that frequent as it was on x86 though).
Problem: Getting spurious ticks where hrtimer_interrupt() returns
without servicing any hrtimers.
Kernel hack to catch this: http://pastebin.com/bTM7nqDc (Over 3.16-rc3)
X86 boot logs: http://pastebin.com/E6axDnsa (search: hrtimer_interrupt)
/proc/cpuinfo: http://pastebin.com/uQx9TmsA
The last I could debug it to is:
- Clockevent device is programmed for time 'x' seconds (Verified this
by storing next-event from within lapic_next_event()).
- Tick fires ~300 us before 'x'
- Traversing through the list of hrtimers doesn't result in any pending
hrtimer and we simply return. And so *spurious* interrupt.
- Happens when ticks are active or stopped (search for "tick-stopped"
in logs)
Driver monitored for x86: arch/x86/kernel/apic/apic.c
Similar behavior observed on exynos with arm_arch_timer.c
I couldn't get any deeper into it to see what's going on. From the behavior
It looks lik the calculations we are doing with dev->mult/shift gives
timeout <= next-event, whereas it should be >= ? Not at all sure though.
Reported-by: Santosh Shukla <santosh.shukla@...aro.org>
Note: Even the Hacky patchset that tried to to disable clockevent device
when not used anymore, isn't able to fix it:
https://lkml.org/lkml/2014/5/9/99..
--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists