lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 12 May 2009 16:58:47 -0700
From:	john stultz <johnstul@...ibm.com>
To:	Jon Hunter <jon-hunter@...com>
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep   
 formorethan2.15 seconds

On Tue, 2009-05-12 at 18:35 -0500, Jon Hunter wrote:
> john stultz wrote:
> > Yea. NSEC_PER_SEC/HZ would probably be safe. I was initially thinking
> > being more paranoid and just dividing it in half, but that's probably a
> > bit silly.
> 
> Thanks, I have added the code to subtract NSEC_PER_SEC/HZ. Should we 
> have any concerns about the adjustment of the mult value? This is the 
> only thing that could impact the value returned from 
> timekeeping_max_deferment(). I am not familiar with exactly how this is 
> working so just wanted to ask.

Well, the mult adjustments should be quite small, especially compared to
the NSEC_PER_SEC/HZ adjustment.

Hmm... Although, I guess we could get bitten if the max_deferment was
like an hour, and the adjustment was enough that it scaled out to and we
ended up being a second late or so. So you have a point.

But since the clockevent driver is not scaled, we probably can get away
with using the orig_mult value instead of mult, and be ok.

Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
up without a problem. 

I suspect it would be tough to hit this issue though.

> > As far the decision to defer if the next even is greater then one jiffy
> > away, that seems reasonable, but I'd not embed that into the
> > timekeeping_max_deferrment(). 
> > 
> > I'm suggesting we drop timekeeping_max_deferrment() down since that's
> > the absolute maximum and we're sure to break if we actually wait that
> > long (since the time between clocksource reads would certainly be longer
> > due to execution delay). 1HZ seems reasonable, since we should easily be
> > able to run the tick code twice in that time, as well as it should be
> > easily within the interrupt programming granularity.
> > 
> > Any additional decisions as to how far out we should be before we start
> > skipping ticks would be up to the tick resched code, and shouldn't be in
> > the timekeeping function.
> > 
> > Sound sane? If so add that in and I'll ack it.
> 
> Yes, agree. See below. By the way I have kept the below patch separate 
> from the original I posted here:
> 
> http://marc.info/?l=linux-kernel&m=124026224019895&w=2
> 
> I was not sure if you would prefer to keep these as two patch series or 
> make it one single patch. Let me know if you would like me to combine or 
> re-post as a two patch series.

Two patches should be fine.

> Please note that the environment I have been running some basic tests on 
> is a single core ARM device. I just wanted to let you know in case you 
> have any concerns with this.
> 
> > This looks *much* better to me. Thanks for reworking it!
> 
> Great! No problem. Thanks for your help and feedback.
> 
> Cheers
> Jon
> 
> 
> Signed-off-by: Jon Hunter <jon-hunter@...com>

Looks good overall. We may want to add the -10% (or -5%) to be totally
safe, but that's likely just me being paranoid.

Also one more safety issue below.

Otherwise,
Acked-by: John Stultz <johnstul@...ibm.com>


thanks
-john


> ---
>   include/linux/time.h      |    1 +
>   kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
>   kernel/time/timekeeping.c |   19 +++++++++++++++++++
>   3 files changed, 45 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..7617fbe 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,25 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +
> +	/*
> +	 * Limit the time the clocksource can be
> +	 * deferred by one jiffie period to ensure
> +	 * that the clocksource will not wrap.
> +	 */
> +	max_nsecs = cyc2ns(clock, clock->mask) - (NSEC_PER_SEC/HZ);
> +

This seems really unlikely, but you might want to add something like:

	if (max_nsecs < 0)
		max_nsecs = 0;

To avoid negative underflows. I don't see how a system could be running
in highres mode if the clocksource isn't continuous for longer then a
tick, but probably a good idea none the less.


> +	return max_nsecs;
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ