[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241110225545.GA1579217@google.com>
Date: Sun, 10 Nov 2024 22:55:45 +0000
From: Joel Fernandes <joel@...lfernandes.org>
To: linux-kernel@...r.kernel.org,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC 3/3] tick-sched: Replace jiffie readout with idle_entrytime
On Fri, Nov 08, 2024 at 05:48:36PM +0000, Joel Fernandes (Google) wrote:
> This solves the issue where jiffies can be stale and inaccurate.
>
> Putting some prints, I see that basemono can be quite stale:
> tick_nohz_next_event: basemono=18692000000 basemono_from_idle_entrytime=18695000000
>
> Since we have 'now' in ts->idle_entrytime, we can just use that. It is
> more accurate, cleaner, reduces lines of code and reduces any lock
> contention with the seq locks.
>
> I was also concerned about issue where jiffies is not updated for a long
> time, and then we receive a non-tick interrupt in the future. Relying on
> stale jiffies value and using that as base can be inaccurate to determine
> whether next event occurs within next tick. Fix that.
>
> XXX: Need to fix issue in idle accounting which does 'jiffies -
> idle_entrytime'. If idle_entrytime is more current than jiffies, it
> could cause negative values. I could replace jiffies with idle_exittime
> in this computation potentially to fix that.
>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
> kernel/time/tick-sched.c | 27 +++++++--------------------
> 1 file changed, 7 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 4aa64266f2b0..22a4f96d9585 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -860,24 +860,6 @@ static inline bool local_timer_softirq_pending(void)
> return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
> }
>
> -/*
> - * Read jiffies and the time when jiffies were updated last
> - */
> -u64 get_jiffies_update(unsigned long *basej)
> -{
> - unsigned long basejiff;
> - unsigned int seq;
> - u64 basemono;
> -
> - do {
> - seq = read_seqcount_begin(&jiffies_seq);
> - basemono = last_jiffies_update;
> - basejiff = jiffies;
> - } while (read_seqcount_retry(&jiffies_seq, seq));
> - *basej = basejiff;
> - return basemono;
> -}
> -
> /**
> * tick_nohz_next_event() - return the clock monotonic based next event
> * @ts: pointer to tick_sched struct
> @@ -887,14 +869,19 @@ u64 get_jiffies_update(unsigned long *basej)
> * *%0 - When the next event is a maximum of TICK_NSEC in the future
> * and the tick is not stopped yet
> * *%next_event - Next event based on clock monotonic
> + *
> + * Note: ts->idle_entrytime is updated with 'now' via tick_nohz_idle_enter().
> */
> static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
> {
> - u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo;
> + u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
> unsigned long basejiff;
> int tick_cpu;
>
> - basemono = get_jiffies_update(&basejiff);
> + boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
> + basejiff = boot_ticks + INITIAL_JIFFIES;
> + basemono = boot_ticks * TICK_NSEC;
> +
There is a bug here, I end up overcounting basejiff. I did something like
this and it now makes basejiff equivalent to the previous code so should be
good. I'll work more on it this week...
─╯
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..5387c67eea7a 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -34,6 +34,8 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device);
*/
ktime_t tick_next_period;
+ktime_t tick_first_period;
+
/*
* tick_do_timer_cpu is a timer core internal variable which holds the CPU NR
* which is responsible for calling do_timer(), i.e. the timekeeping stuff. This
@@ -219,6 +221,7 @@ static void tick_setup_device(struct tick_device *td,
if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
WRITE_ONCE(tick_do_timer_cpu, cpu);
tick_next_period = ktime_get();
+ tick_first_period = tick_next_period;
#ifdef CONFIG_NO_HZ_FULL
/*
* The boot CPU may be nohz_full, in which case set
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 5f2105e637bd..a15721516a85 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -20,6 +20,7 @@ struct timer_events {
DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
extern ktime_t tick_next_period;
+extern ktime_t tick_first_period;
extern int tick_do_timer_cpu __read_mostly;
extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8a245f8ceb56..8fdfda4b8af3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -895,11 +896,23 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
unsigned long basejiff;
int tick_cpu;
boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
basejiff = boot_ticks + INITIAL_JIFFIES;
basemono = boot_ticks * TICK_NSEC;
+ /*
+ * There is some time that passes between when clocksource starts and the
+ * first time tick device is setup. Offset basejiff by that.
+ */
+ basejiff -= DIV_ROUND_DOWN_ULL(tick_first_period, TICK_NSEC);
+
ts->last_jiffies = basejiff;
ts->timer_expires_base = basemono;
Powered by blists - more mailing lists