[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb0dba1d-0edf-52a8-b546-750a68e55323@gentwo.org>
Date: Wed, 2 Jul 2025 17:25:38 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Thomas Gleixner <tglx@...utronix.de>
cc: Christoph Lameter via B4 Relay <devnull+cl.gentwo.org@...nel.org>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...nel.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, sh@...two.org,
Darren Hart <dvhart@...radead.org>
Subject: Re: [PATCH] Skew tick for systems with a large number of
processors
On Thu, 3 Jul 2025, Thomas Gleixner wrote:
> The above aside. As you completely failed to provide at least the
> minimal historical background in the change log, let me fill in the
> blanks.
>
> commit 3704540b4829 ("tick management: spread timer interrupt") added the
> skew unconditionally in 2007 to avoid lock contention on xtime lock.
Right but that was only one reason why the timer interrupts where
staggered.
> commit af5ab277ded0 ("clockevents: Remove the per cpu tick skew")
> removed it in 2010 because the xtime lock contention was gone and the
> skew affected the power consumption of slightly loaded _large_ servers.
But then the tick also executes other code that can cause contention. Why
merge such an obvious problematic patch without considering the reasons
for the 2007 patch?
> commit 5307c9556bc1 ("tick: Add tick skew boot option") brought it back
> with a command line option to address contention and jitter issues on
> larger systems.
And then issues resulted because the scaling issues where not
considered when merging the 2010 patch.
> So while you preserved the behaviour of the command line option in the
> most obscure way, you did not even make an attempt to explain why this
> change does not bring back the issues which caused the removal in commit
> af5ab277ded0 or why they are irrelevant today.
As pointed out in the patch description: The synchronized tick (aside from
the jitter) also causes power spikes on large core systems which can cause
system instabilities.
> "Scratches my itch" does not work and you know that. This needs to be
> consolidated both on the implementation side and also on the user
> side.
We can get to that but I at least need some direction on how to approach
this and figure out the concerns that exist. Frankly my initial idea was
just to remove the buggy patches since this caused a regression in
performance and system stability but I guess there were power savings
concerns.
How can we address this issue in a better way then? The kernel should not
come up all wobbly and causing power spikes every tick.
Powered by blists - more mailing lists