lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241110225545.GA1579217@google.com>
Date: Sun, 10 Nov 2024 22:55:45 +0000
From: Joel Fernandes <joel@...lfernandes.org>
To: linux-kernel@...r.kernel.org,
	Anna-Maria Behnsen <anna-maria@...utronix.de>,
	Frederic Weisbecker <frederic@...nel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC 3/3] tick-sched: Replace jiffie readout with idle_entrytime

On Fri, Nov 08, 2024 at 05:48:36PM +0000, Joel Fernandes (Google) wrote:
> This solves the issue where jiffies can be stale and inaccurate.
> 
> Putting some prints, I see that basemono can be quite stale:
> tick_nohz_next_event: basemono=18692000000 basemono_from_idle_entrytime=18695000000
> 
> Since we have 'now' in ts->idle_entrytime, we can just use that. It is
> more accurate, cleaner, reduces lines of code and reduces any lock
> contention with the seq locks.
> 
> I was also concerned about issue where jiffies is not updated for a long
> time, and then we receive a non-tick interrupt in the future. Relying on
> stale jiffies value and using that as base can be inaccurate to determine
> whether next event occurs within next tick. Fix that.
> 
> XXX: Need to fix issue in idle accounting which does 'jiffies -
> idle_entrytime'. If idle_entrytime is more current than jiffies, it
> could cause negative values. I could replace jiffies with idle_exittime
> in this computation potentially to fix that.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
>  kernel/time/tick-sched.c | 27 +++++++--------------------
>  1 file changed, 7 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 4aa64266f2b0..22a4f96d9585 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -860,24 +860,6 @@ static inline bool local_timer_softirq_pending(void)
>  	return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
>  }
>  
> -/*
> - * Read jiffies and the time when jiffies were updated last
> - */
> -u64 get_jiffies_update(unsigned long *basej)
> -{
> -	unsigned long basejiff;
> -	unsigned int seq;
> -	u64 basemono;
> -
> -	do {
> -		seq = read_seqcount_begin(&jiffies_seq);
> -		basemono = last_jiffies_update;
> -		basejiff = jiffies;
> -	} while (read_seqcount_retry(&jiffies_seq, seq));
> -	*basej = basejiff;
> -	return basemono;
> -}
> -
>  /**
>   * tick_nohz_next_event() - return the clock monotonic based next event
>   * @ts:		pointer to tick_sched struct
> @@ -887,14 +869,19 @@ u64 get_jiffies_update(unsigned long *basej)
>   * *%0		- When the next event is a maximum of TICK_NSEC in the future
>   *		  and the tick is not stopped yet
>   * *%next_event	- Next event based on clock monotonic
> + *
> + * Note: ts->idle_entrytime is updated with 'now' via tick_nohz_idle_enter().
>   */
>  static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
>  {
> -	u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo;
> +	u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
>  	unsigned long basejiff;
>  	int tick_cpu;
>  
> -	basemono = get_jiffies_update(&basejiff);
> +	boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
> +	basejiff = boot_ticks + INITIAL_JIFFIES;
> +	basemono = boot_ticks * TICK_NSEC;
> +

There is a bug here, I end up overcounting basejiff. I did something like
this and it now makes basejiff equivalent to the previous code so should be
good. I'll work more on it this week...

                                                                                       ─╯
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..5387c67eea7a 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -34,6 +34,8 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device);
  */
 ktime_t tick_next_period;

+ktime_t tick_first_period;
+
 /*
  * tick_do_timer_cpu is a timer core internal variable which holds the CPU NR
  * which is responsible for calling do_timer(), i.e. the timekeeping stuff. This
@@ -219,6 +221,7 @@ static void tick_setup_device(struct tick_device *td,
                if (READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
                        WRITE_ONCE(tick_do_timer_cpu, cpu);
                        tick_next_period = ktime_get();
+                       tick_first_period = tick_next_period;
 #ifdef CONFIG_NO_HZ_FULL
                        /*
                         * The boot CPU may be nohz_full, in which case set
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 5f2105e637bd..a15721516a85 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -20,6 +20,7 @@ struct timer_events {

 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 extern ktime_t tick_next_period;
+extern ktime_t tick_first_period;
 extern int tick_do_timer_cpu __read_mostly;

 extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8a245f8ceb56..8fdfda4b8af3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -895,11 +896,23 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
        u64 basemono, next_tick, delta, expires, delta_hr, next_hr_wo, boot_ticks;
        unsigned long basejiff;
        int tick_cpu;

        boot_ticks = DIV_ROUND_DOWN_ULL(ts->idle_entrytime, TICK_NSEC);
        basejiff = boot_ticks + INITIAL_JIFFIES;
        basemono = boot_ticks * TICK_NSEC;

+       /*
+        * There is some time that passes between when clocksource starts and the
+        * first time tick device is setup. Offset basejiff by that.
+       */
+       basejiff -= DIV_ROUND_DOWN_ULL(tick_first_period, TICK_NSEC);
+
        ts->last_jiffies = basejiff;
        ts->timer_expires_base = basemono;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ