[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3edc0370-dc18-69f8-0862-f0a317db9118@gentwo.org>
Date: Mon, 9 Feb 2026 10:33:47 -0800 (PST)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Hillf Danton <hdanton@...a.com>
cc: 连子涵 <17317795071@....com>, tglx@...utronix.de,
linux-kernel@...r.kernel.org
Subject: Re: [Question] Voltage droop from synchronized timer interrupts(tick)
on many-core SoCs leads to system instability
On Thu, 5 Feb 2026, Hillf Danton wrote:
> On Thu, 5 Feb 2026 12:52:04 +0800 (CST) =?GBK?B?wazX07qt?= wrote:
> > Hi all,
> > We have observed a critical voltage droop issue on large-core-count SoC platforms (e.g., 64+ cores) that appears to stem directly from the synchronized periodic timer interrupts(tick) in the Linux kernel.
> >
> > In our testing and power simulations, we found that:
> > When all CPU cores enter the timer interrupt handler simultaneously, there is a sharp, instantaneous power surge and continuous power fluctuations during the interrupt handling window (which lasts several microseconds), leading to significant voltage droop. In severe cases, this droop can cause system instability or even prevent the OS from booting.
> >
> > We understand that enabling skew_tick=1 effectively mitigates this by
> > staggering the per-CPU tick timers. However, in certain deployment
> > scenarios, modifying any kernel boot parameter—including skew_tick—is
> > not permitted.
You could build a custom kernel that enables it by default.
Could you post test results that may convince us to make skew_tick the
default for certain configurations?
I have had issues getting good power readings for smaller configurations
since the SOC power state fluctuated. If we had some results that show
skew_tick to not be hurtful at low core counts but good at high ones then
we could change the default.
> > Given this constraint, we would greatly appreciate your insights on
> > the following technical questions:
> >1. Why does the timer interrupt
> > path consume so much power and exhibit such large instantaneous
> > variations? Our power simulation shows that the average power during
> > timer interrupt handling is comparable to Dhrystone benchmark.
Because all processors need to be active and running at the same time. The
SOC must power up instantly and power will then drop again rapidly. This
is a pretty bad scenario that requires the SOC manufacturers to actually
increase the default voltage to the SOC to deal with this instability.
> >2. What
> > is the typical duration of a single timer interrupt handler
> > (tick_nohz_handler, etc.) on a modern x86 or ARM core? Is it generally
> > on the order of a few microseconds?
My estimate would be 2-10 micros but Thomas may know better.
> >3. Beyond skew_tick=1, are there
> > other kernel mechanisms or runtime strategies that could reduce the
> > power impact of synchronized timer events? Are there plans in future
> > kernel versions to address this issue more fundamentally—especially
> > for many-core platforms?
The SOC could be modified to delay if too many interrupts hit the cpu at
once?
Powered by blists - more mailing lists