lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120504120455.GB4413@somewhere.redhat.com>
Date:	Fri, 4 May 2012 14:04:58 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Gilad Ben-Yossef <gilad@...yossef.com>
Cc:	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Tejun Heo <tj@...nel.org>, John Stultz <johnstul@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mel@....ul.ie>, Mike Frysinger <vapier@...too.org>,
	David Rientjes <rientjes@...gle.com>,
	Hugh Dickins <hughd@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	Konstantin Khlebnikov <khlebnikov@...nvz.org>,
	Christoph Lameter <cl@...ux.com>,
	Chris Metcalf <cmetcalf@...era.com>,
	Hakan Akkan <hakanakkan@...il.com>,
	Max Krasnyansky <maxk@...lcomm.com>, linux-mm@...ck.org
Subject: Re: [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about
 no future event

On Thu, May 03, 2012 at 05:55:57PM +0300, Gilad Ben-Yossef wrote:
> Current timer code fails to correctly return a value meaning
> that there is no future timer event, with the result that
> the timer keeps getting re-armed in HZ one shot mode even
> when we could turn it off, generating unneeded interrupts.
> This patch attempts to fix this problem.
> 
> What is happening is that when __next_timer_interrupt() wishes
> to return a value that signifies "there is no future timer
> event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
> 
> However, the code in tick_nohz_stop_sched_tick(), which called
> __next_timer_interrupt() via get_next_timer_interrupt(),
> compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
> to see if the timer needs to be re-armed.
> 
> base->timer_jiffies != last_jiffies and so
> tick_nohz_stop_sched_tick() interperts the return value as
> indication that there is a distant future event 12 days
> from now and programs the timer to fire next after KIME_MAX
> nsecs instead of avoiding to arm it. This ends up causesing
> a needless interrupt once every KTIME_MAX nsecs.

Good catch! So if I understand correctly, base->timer_jiffies can
be backward compared to last_jiffies. If we return
base->timer_jiffies + NEXT_TIMER_MAX_DELTA, the next_jiffies - last_jiffies
diff gives a delta that is a bit before NEXT_TIMER_MAX_DELTA.

And this can indeed happen if we haven't got any timer list executed since
we updated jiffies last, timer_jiffies can be a backward compared to last_jiffies.

This is harmless but causes needless timers.

I just have small comment below:

> 
> I've noticed a similar but slightly different fix to the
> same problem in the Tilera kernel tree from Chris M. (I've
> wrote this before seeing that one), so some variation of this
> fix is in use on real hardware for some time now.
> 
> Signed-off-by: Gilad Ben-Yossef <gilad@...yossef.com>
> CC: Thomas Gleixner <tglx@...utronix.de>
> CC: Tejun Heo <tj@...nel.org>
> CC: John Stultz <johnstul@...ibm.com>
> CC: Andrew Morton <akpm@...ux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
> CC: Mel Gorman <mel@....ul.ie>
> CC: Mike Frysinger <vapier@...too.org>
> CC: David Rientjes <rientjes@...gle.com>
> CC: Hugh Dickins <hughd@...gle.com>
> CC: Minchan Kim <minchan.kim@...il.com>
> CC: Konstantin Khlebnikov <khlebnikov@...nvz.org>
> CC: Christoph Lameter <cl@...ux.com>
> CC: Chris Metcalf <cmetcalf@...era.com>
> CC: Hakan Akkan <hakanakkan@...il.com>
> CC: Max Krasnyansky <maxk@...lcomm.com>
> CC: Frederic Weisbecker <fweisbec@...il.com>
> CC: linux-kernel@...r.kernel.org
> CC: linux-mm@...ck.org
> ---
>  kernel/timer.c |   31 +++++++++++++++++++++----------
>  1 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..32ba64a 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1187,11 +1187,13 @@ static inline void __run_timers(struct tvec_base *base)
>   * is used on S/390 to stop all activity when a CPU is idle.
>   * This function needs to be called with interrupts disabled.
>   */
> -static unsigned long __next_timer_interrupt(struct tvec_base *base)
> +static bool __next_timer_interrupt(struct tvec_base *base,
> +					unsigned long *next_timer)
>  {
>  	unsigned long timer_jiffies = base->timer_jiffies;
>  	unsigned long expires = timer_jiffies + NEXT_TIMER_MAX_DELTA;
> -	int index, slot, array, found = 0;
> +	int index, slot, array;
> +	bool found = false;
>  	struct timer_list *nte;
>  	struct tvec *varray[4];
>  
> @@ -1202,12 +1204,12 @@ static unsigned long __next_timer_interrupt(struct tvec_base *base)
>  			if (tbase_get_deferrable(nte->base))
>  				continue;
>  
> -			found = 1;
> +			found = true;
>  			expires = nte->expires;
>  			/* Look at the cascade bucket(s)? */
>  			if (!index || slot < index)
>  				goto cascade;
> -			return expires;
> +			goto out;
>  		}
>  		slot = (slot + 1) & TVR_MASK;
>  	} while (slot != index);
> @@ -1233,7 +1235,7 @@ cascade:
>  				if (tbase_get_deferrable(nte->base))
>  					continue;
>  
> -				found = 1;
> +				found = true;
>  				if (time_before(nte->expires, expires))
>  					expires = nte->expires;
>  			}
> @@ -1245,7 +1247,7 @@ cascade:
>  				/* Look at the cascade bucket(s)? */
>  				if (!index || slot < index)
>  					break;
> -				return expires;
> +				goto out;
>  			}
>  			slot = (slot + 1) & TVN_MASK;
>  		} while (slot != index);
> @@ -1254,7 +1256,10 @@ cascade:
>  			timer_jiffies += TVN_SIZE - index;
>  		timer_jiffies >>= TVN_BITS;
>  	}
> -	return expires;
> +out:
> +	if (found)
> +		*next_timer = expires;
> +	return found;
>  }
>  
>  /*
> @@ -1317,9 +1322,15 @@ unsigned long get_next_timer_interrupt(unsigned long now)
>  	if (cpu_is_offline(smp_processor_id()))
>  		return now + NEXT_TIMER_MAX_DELTA;
>  	spin_lock(&base->lock);
> -	if (time_before_eq(base->next_timer, base->timer_jiffies))
> -		base->next_timer = __next_timer_interrupt(base);
> -	expires = base->next_timer;
> +	if (time_before_eq(base->next_timer, base->timer_jiffies)) {
> +
> +		if (__next_timer_interrupt(base, &expires))
> +			base->next_timer = expires;
> +		else
> +			expires = now + NEXT_TIMER_MAX_DELTA;

I believe you can update base->next_timer to now + NEXT_TIMER_MAX_DELTA,
so on any further idle interrupt exit that call tick_nohz_stop_sched_tick(),
we won't get again the overhead of __next_timer_interrupt().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ