lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260205063714.2579-1-hdanton@sina.com>
Date: Thu,  5 Feb 2026 14:37:11 +0800
From: Hillf Danton <hdanton@...a.com>
To: 连子涵 <17317795071@....com>
Cc: tglx@...utronix.de,
	"Christoph Lameter (Ampere)" <cl@...two.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [Question] Voltage droop from synchronized timer interrupts(tick) on many-core SoCs leads to system instability

On Thu, 5 Feb 2026 12:52:04 +0800 (CST) =?GBK?B?wazX07qt?= wrote:
> Hi all,
> We have observed a critical voltage droop issue on large-core-count SoC platforms (e.g., 64+ cores) that appears to stem directly from the synchronized periodic timer interrupts(tick) in the Linux kernel. 
> 
> In our testing and power simulations, we found that: 
> When all CPU cores enter the timer interrupt handler simultaneously, there is a sharp, instantaneous power surge and continuous power fluctuations during the interrupt handling window (which lasts several microseconds), leading to significant voltage droop. In severe cases, this droop can cause system instability or even prevent the OS from booting.
> 
> We understand that enabling skew_tick=1 effectively mitigates this by staggering the per-CPU tick timers. However, in certain deployment scenarios, modifying any kernel boot parameter—including skew_tick—is not permitted.
> 
> Given this constraint, we would greatly appreciate your insights on the following technical questions: 
> 1. Why does the timer interrupt path consume so much power and exhibit such large instantaneous variations? Our power simulation shows that the average power during timer interrupt handling is comparable to Dhrystone benchmark. 
> 2. What is the typical duration of a single timer interrupt handler (tick_nohz_handler, etc.) on a modern x86 or ARM core? Is it generally on the order of a few microseconds? 
> 3. Beyond skew_tick=1, are there other kernel mechanisms or runtime strategies that could reduce the power impact of synchronized timer events? Are there plans in future kernel versions to address this issue more fundamentally—especially for many-core platforms? 
> 
> 
> Thank you very much for your time and expertise. 
> 
Sounds like a known issue, feel free to see the comments in 2025 [1].

[1] Subject: Re: [PATCH] Skew tick for systems with a large number of processors
https://lore.kernel.org/lkml/87sejew87r.ffs@tglx/
> 
> Best regards, 
> Zihan Lian <17317795071@....com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ