[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250128063301.3879317-1-jstultz@google.com>
Date: Mon, 27 Jan 2025 22:32:52 -0800
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Stephen Boyd <sboyd@...nel.org>,
Yury Norov <yury.norov@...il.com>, Bitao Hu <yaoma@...ux.alibaba.com>,
Andrew Morton <akpm@...ux-foundation.org>, kernel-team@...roid.com
Subject: [RFC][PATCH 0/3] DynamicHZ: Configuring the timer tick rate at boot time
The HZ value has long been a compile time constant. This is
really useful as there is a lot of hotpath code that uses HZ
when setting timers (needing to convert nanoseconds to ticks),
thus the division is much faster with a compile time constant
divisor.
However, having to select the system HZ value at build time is
somewhat limiting. Distros have to make choices for their users
as to what the best HZ value would be balancing latency and
power usage.
With Android, this is a major issue, as we have one GKI binary
that runs across a wide array of devices from top of the line
flagship phones to watches. Balancing the choice for HZ is
difficult, we currently have HZ=250, but some devices would love
to have HZ=1000, while other devices aren’t willing to pay the
power cost of 4x the timer slots, resulting in shorter idle
times.
(As an aside, some suggested RCU_LAZY would avoid the cost
of bumping to HZ=1000, and it does indeed help a good bit, but
we still see higher power usage compared with HZ=250)
So I’ve been thinking and talking about an idea for awhile to
try to address this: DynamicHZ.
The idea is we just set HZ=1000, with 1ms granular timers.
However, using a boot time argument, we can optionally program
the clockevent that generates the timer tick to fire at a lower
frequency. Effectively this is just like the lost ticks handling
done for SMIs. The jiffies accounting is handled by the time the
clocksource sees pass, so time progresses properly, and timers
for all the ticks we skip will fire when the next timer
interrupt occurs. So with dyn_hz=250, the interrupt fires every
fourth tick and with dyn_hz=100, every 10th.
So far, this approach has seemed to work ok.
One area that needed adjustments was the cputime accounting, as
it assumes we only account one tick per interrupt, so I’ve
reworked some of that logic to pipe through the actual tick
count.
Once that was addressed, in testing with HZ=1000, dyn_hz=250,
all of the time/timer tests I've tried seem to be working
properly. I don’t see any performance regressions on tests like
Geekbench(compared to HZ=250), nor have I observed any power
regressions.
Though as the tick and scheduler code are intertwined and
neither subsystem is particularly simple, I expect I may still
be missing or forgetting things. So I wanted to send this out
for review and feedback.
I’d love to hear your thoughts or concerns!
Also, I've not yet gotten this to work for the fixed
periodic-tick paths (you need a oneshot capable clockevent).
Mostly because in that case we always just increment by a single
tick. While for dyn_hz=250 or dyn_hz=1000 calculating the
periodic tick count is pretty simple (4 ticks, 10 ticks). But
for dyn_hz=300, or other possible values, it doesn’t evenly
divide, so we would have to do a 3,3,4,3,3,4 style interval to
stay on time and I’ve not yet thought through how to do
remainder handling efficiently yet.
thanks
-john
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: Frederic Weisbecker <frederic@...nel.org>
Cc: Ingo Molnar <mingo@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Stephen Boyd <sboyd@...nel.org>
Cc: Yury Norov <yury.norov@...il.com>
Cc: Bitao Hu <yaoma@...ux.alibaba.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: kernel-team@...roid.com
John Stultz (3):
time/tick: Pipe tick count down through cputime accounting
time/tick: Introduce a dyn_hz boot option
Kconfig: Add CONFIG_DYN_HZ_DEFAULT to specify the default dynhz= boot
option value
include/linux/kernel_stat.h | 4 ++--
include/linux/tick.h | 11 +++++++++--
kernel/Kconfig.hz | 19 +++++++++++++++++++
kernel/sched/cputime.c | 6 +++---
kernel/time/tick-common.c | 32 +++++++++++++++++++++++++++++++-
kernel/time/tick-legacy.c | 2 +-
kernel/time/tick-sched.c | 33 ++++++++++++++++++---------------
kernel/time/timekeeping.h | 2 +-
kernel/time/timer.c | 4 ++--
9 files changed, 86 insertions(+), 27 deletions(-)
--
2.48.1.262.g85cc9f2d1e-goog
Powered by blists - more mailing lists