linux-kernel - Re: [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 5 Mar 2014 22:51:02 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Rik van Riel <riel@...hat.com>
cc:	linux-kernel@...r.kernel.org, Mateusz Guzik <mguzik@...hat.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Ingo Molnar <mingo@...hat.com>,
	Prarit Bhargava <prarit@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Clark Williams <williams@...hat.com>
Subject: Re: [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in
 softirq context

On Wed, 5 Mar 2014, Rik van Riel wrote:
> There appears to be a deadlock in the hrtimer code. Specifically,
> clock_was_set() calls an IPI with wait=1, from softirq context.

This should not be called from softirq context.
 
> Waiting for IPIs to complete in irq context can lead to a deadlock,
> because the current code (that was interrupted) might be holding some
> kind of lock, that another CPU is waiting for with spin_lock_irq or
> similar.
> 
> In other words, the current CPU may need to release a resource, before
> the IPI can be handled by one of the destination CPUs.
> 
> To my untrained eye, it does not look like this patch introduces a
> new bug to the timer code, but that is hard to ascertain with the
> timer code. so I am posting this as an RFC for the timer gods to hurt
> their brains on :)
> 
> This bug was introduced by 54cdfdb4 in early 2007 (the original
> hrtimer code patch).

Right and we had some issues with that until we moved the calls to
clock_was_set() out of lock held regions.

The only call which happens from interrupt context is in
update_wall_time(). And that one definitely holds no locks which are
relevant.

On which kernel are you observing the issue?

Can you provide the debug info which made you look into this?

Thanks,

	tglx
 
> Not-yet-signed-off-by: Rik van Riel <riel@...hat.com>
> Reported-by: Mateusz Guzik <mguzik@...hat.com>
> Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Prarit Bhargava <prarit@...hat.com>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: Clark Williams <williams@...hat.com>
> ---
>  kernel/hrtimer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 0909436..19145ec 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -771,7 +771,7 @@ void clock_was_set(void)
>  {
>  #ifdef CONFIG_HIGH_RES_TIMERS
>  	/* Retrigger the CPU local events everywhere */
> -	on_each_cpu(retrigger_next_event, NULL, 1);
> +	on_each_cpu(retrigger_next_event, NULL, 0);
>  #endif
>  	timerfd_clock_was_set();
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/