[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231206163536.r9DcrsWQ@linutronix.de>
Date: Wed, 6 Dec 2023 17:35:36 +0100
From: Sebastian Siewior <bigeasy@...utronix.de>
To: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
John Stultz <jstultz@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Eric Dumazet <edumazet@...gle.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
Arjan van de Ven <arjan@...radead.org>,
"Paul E . McKenney" <paulmck@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Rik van Riel <riel@...riel.com>,
Steven Rostedt <rostedt@...dmis.org>,
Giovanni Gherdovich <ggherdovich@...e.cz>,
Lukasz Luba <lukasz.luba@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Srinivas Pandruvada <srinivas.pandruvada@...el.com>,
K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [PATCH v9 30/32] timers: Implement the hierarchical pull model
On 2023-12-01 10:26:52 [+0100], Anna-Maria Behnsen wrote:
…
> As long as a CPU is busy it expires both local and global timers. When a
> CPU goes idle it arms for the first expiring local timer. If the first
> expiring pinned (local) timer is before the first expiring movable timer,
> then no action is required because the CPU will wake up before the first
> movable timer expires. If the first expiring movable timer is before the
> first expiring pinned (local) timer, then this timer is queued into a idle
an
> timerqueue and eventually expired by some other active CPU.
s/some other/another ?
…
>
> Signed-off-by: Anna-Maria Behnsen <anna-maria@...utronix.de>
> ---
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index b6c9ac0c3712..ac3e888d053f 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -2103,6 +2104,64 @@ void timer_lock_remote_bases(unsigned int cpu)
…
> +static void timer_use_tmigr(unsigned long basej, u64 basem,
> + unsigned long *nextevt, bool *tick_stop_path,
> + bool timer_base_idle, struct timer_events *tevt)
> +{
> + u64 next_tmigr;
> +
> + if (timer_base_idle)
> + next_tmigr = tmigr_cpu_new_timer(tevt->global);
> + else if (tick_stop_path)
> + next_tmigr = tmigr_cpu_deactivate(tevt->global);
> + else
> + next_tmigr = tmigr_quick_check();
> +
> + /*
> + * If the CPU is the last going idle in timer migration hierarchy, make
> + * sure the CPU will wake up in time to handle remote timers.
> + * next_tmigr == KTIME_MAX if other CPUs are still active.
> + */
> + if (next_tmigr < tevt->local) {
> + u64 tmp;
> +
> + /* If we missed a tick already, force 0 delta */
> + if (next_tmigr < basem)
> + next_tmigr = basem;
> +
> + tmp = div_u64(next_tmigr - basem, TICK_NSEC);
Is this considered a hot path? Asking because u64 divs are nice if can
be avoided ;)
I guess the original value is from fetch_next_timer_interrupt(). But
then you only need it if the caller (__get_next_timer_interrupt()) has
the `idle' value set. Otherwise the operation is pointless.
Would it somehow work to replace
base_local->is_idle = time_after(nextevt, basej + 1);
with maybe something like
base_local->is_idle = tevt.local > basem + TICK_NSEC
If so you could avoid the `nextevt' maneuver.
> + *nextevt = basej + (unsigned long)tmp;
> + tevt->local = next_tmigr;
> + }
> +}
> +# else
…
> @@ -2132,6 +2190,21 @@ static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem,
> nextevt = fetch_next_timer_interrupt(basej, basem, base_local,
> base_global, &tevt);
>
> + /*
> + * When the when the next event is only one jiffie ahead there is no
If the next event is only one jiffy ahead then there is no
> + * need to call timer migration hierarchy related
> + * functions. @tevt->global will be KTIME_MAX, nevertheless if the next
> + * timer is a global timer. This is also true, when the timer base is
The second sentence is hard to parse.
> + * idle.
> + *
> + * The proper timer migration hierarchy function depends on the callsite
> + * and whether timer base is idle or not. @nextevt will be updated when
> + * this CPU needs to handle the first timer migration hierarchy event.
> + */
> + if (time_after(nextevt, basej + 1))
> + timer_use_tmigr(basej, basem, &nextevt, idle,
> + base_local->is_idle, &tevt);
> +
> /*
> * We have a fresh next event. Check whether we can forward the
> * base.
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> new file mode 100644
> index 000000000000..05cd8f1bc45d
> --- /dev/null
> +++ b/kernel/time/timer_migration.c
> @@ -0,0 +1,1636 @@
…
> +/*
> + * The timer migration mechanism is built on a hierarchy of groups. The
> + * lowest level group contains CPUs, the next level groups of CPU groups
> + * and so forth. The CPU groups are kept per node so for the normal case
> + * lock contention won't happen across nodes. Depending on the number of
> + * CPUs per node even the next level might be kept as groups of CPU groups
> + * per node and only the levels above cross the node topology.
> + *
> + * Example topology for a two node system with 24 CPUs each.
> + *
> + * LVL 2 [GRP2:0]
> + * GRP1:0 = GRP1:M
> + *
> + * LVL 1 [GRP1:0] [GRP1:1]
> + * GRP0:0 - GRP0:2 GRP0:3 - GRP0:5
> + *
> + * LVL 0 [GRP0:0] [GRP0:1] [GRP0:2] [GRP0:3] [GRP0:4] [GRP0:5]
> + * CPUS 0-7 8-15 16-23 24-31 32-39 40-47
In the CPUS list between 24-31 and 32-39 is a tab while the other
separators are spaces. Could you please align it with spaces? Judging
form the top you have tabstop=8 but here tabstop=4 looks "nice".
> + *
> + * The groups hold a timer queue of events sorted by expiry time. These
> + * queues are updated when CPUs go in idle. When they come out of idle
> + * ignore flag of events is set.
> + *
Sebastian
Powered by blists - more mailing lists