lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1403191149460.18573@ionos.tec.linutronix.de>
Date:	Wed, 19 Mar 2014 11:59:00 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Stanislav Fomichev <stfomichev@...dex-team.ru>
cc:	john.stultz@...aro.org, prarit@...hat.com,
	paul.gortmaker@...driver.com, juri.lelli@...il.com,
	ddaney.cavm@...il.com, mbohan@...eaurora.org,
	david.vrabel@...rix.com, david.engraf@...go.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hrtimers: calculate expires_next after all timers are
 executed

On Mon, 17 Mar 2014, Stanislav Fomichev wrote:

> I think I'm hitting particularly subtle issue with NOHZ_IDLE kernel.
> 
> The sequence is as follows:
> - CPU enters idle, we disable tick
> - hrtimer interrupt fires (for hrtimer_wakeup)
> - for clock base #1 (REALTIME) we wake up SCHED_RT thread and
>   start RT period timer (from start_rt_bandwidth) for clock base #0 (MONOTONIC)
> - because we already checked expiry time for clock base #0
>   we end up programming wrong wake up time (minutes, from tick_sched_timer)
> - then, we exit idle loop and restart tick;
>   but because tick_sched_timer is not leftmost (the leftmost one
>   is RT period timer) we don't program it
> 
> So in the end, I see working CPU without tick interrupt.
> This eventually leads to RCU stall on that CPU: rcu_gp_kthread
> is not woken up because there is no tick (this is the reason
> I started digging this up).

Nice analysis!
 
> The proposed fix runs expired timers first and only then tries to find
> next expiry time for all clocks.
> 
> Signed-off-by: Stanislav Fomichev <stfomichev@...dex-team.ru>
> ---
>  kernel/hrtimer.c | 38 ++++++++++++++++++++++++--------------
>  1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 09094361dce5..63805f9f9531 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1282,6 +1282,9 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
>   */
>  void hrtimer_interrupt(struct clock_event_device *dev)
>  {
> +	struct hrtimer_clock_base *base;
> +	struct timerqueue_node *node;
> +	struct hrtimer *timer;
>  	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
>  	ktime_t expires_next, now, entry_time, delta;
>  	int i, retries = 0;
> @@ -1304,8 +1307,6 @@ retry:
>  	cpu_base->expires_next.tv64 = KTIME_MAX;
>  
>  	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
> -		struct hrtimer_clock_base *base;
> -		struct timerqueue_node *node;
>  		ktime_t basenow;
>  
>  		if (!(cpu_base->active_bases & (1 << i)))
> @@ -1315,8 +1316,6 @@ retry:
>  		basenow = ktime_add(now, base->offset);
>  
>  		while ((node = timerqueue_getnext(&base->active))) {
> -			struct hrtimer *timer;
> -
>  			timer = container_of(node, struct hrtimer, node);
>  
>  			/*
> @@ -1332,22 +1331,33 @@ retry:
>  			 * timer will have to trigger a wakeup anyway.
>  			 */
>  
> -			if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer)) {
> -				ktime_t expires;
> -
> -				expires = ktime_sub(hrtimer_get_expires(timer),
> -						    base->offset);
> -				if (expires.tv64 < 0)
> -					expires.tv64 = KTIME_MAX;
> -				if (expires.tv64 < expires_next.tv64)
> -					expires_next = expires;
> +			if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
>  				break;
> -			}
>  
>  			__run_hrtimer(timer, &basenow);
>  		}
>  	}
>  
> +	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
> +		ktime_t expires;
> +
> +		if (!(cpu_base->active_bases & (1 << i)))
> +			continue;
> +
> +		base = cpu_base->clock_base + i;
> +
> +		node = timerqueue_getnext(&base->active);
> +		if (!node)
> +			continue;
> +
> +		timer = container_of(node, struct hrtimer, node);
> +		expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
> +		if (expires.tv64 < 0)
> +			expires.tv64 = KTIME_MAX;
> +		if (expires.tv64 < expires_next.tv64)
> +			expires_next = expires;

Actually we should store the expiry time of the first timer in a base
in the base itself. That needs to be done at enqueue time and

> +			if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
>  				break;

here right before we break out of handling the base. When the last
timer of the base goes away, set the value to KTIME_MAX.

If we store it with the base offset already applied

   base->next = ktime_sub(hrtimer_get_expires(timer), base->offset); 

then the loop to find the next expiry in the interrupt becomes trivial
and fast.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ