lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Mar 2015 21:46:12 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Chris J Arges <chris.j.arges@...onical.com>
Cc:	Rafael David Tinoco <inaddy@...ntu.com>,
	Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>,
	Jiang Liu <jiang.liu@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Gema Gomez <gema.gomez-solano@...onical.com>,
	"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: smp_call_function_single lockups

On Mon, Mar 30, 2015 at 8:15 PM, Chris J Arges
<chris.j.arges@...onical.com> wrote:
> [   13.613531] WARNING: CPU: 0 PID: 0 at ./arch/x86/include/asm/apic.h:444 apic_ack_edge+0x84/0x90()
> [   13.613531]  [<ffffffff8104d3f4>] apic_ack_edge+0x84/0x90
> [   13.613531]  [<ffffffff810cf8e7>] handle_edge_irq+0x57/0x120
> [   13.613531]  [<ffffffff81016aa2>] handle_irq+0x22/0x40
> [   13.613531]  [<ffffffff817a3b9f>] do_IRQ+0x4f/0x140
> [   13.613531]  [<ffffffff817a196d>] common_interrupt+0x6d/0x6d
> [   13.613531]  <EOI>  [<ffffffff810def08>] ? hrtimer_start+0x18/0x20
> [   13.613531]  [<ffffffff8105a356>] ? native_safe_halt+0x6/0x10
> [   13.613531]  [<ffffffff810d5623>] ? rcu_eqs_enter+0xa3/0xb0
> [   13.613531]  [<ffffffff8101ecde>] default_idle+0x1e/0xc0

Hmm. I didn't notice that "hrtimer_start" was always there as a stale
entry on the stack when this happened.

That may well be immaterial - the CPU being idle means that the last
thing it did before going to sleep was likely that "start timer"
thing, but it's interesting even so.

Some issue with reprogramming the hrtimer as it is triggering, kind of
similar to the bootup case I saw where the keyboard init sequence
raises an interrupt that was already cleared by the time the interrupt
happened.

So maybe something like this happens:

 - local timer is about to go off and raises the interrupt line

 - in the meantime, we're reprogramming the timer into the future

 - the CPU takes the interrupt, but now the timer has been
reprogammed, so the irq line is no longer active, and ISR is zero even
though we took the interrupt (which is why the new warning triggers)

  - we're running the local timer interrupt (which happened due to the
*old* programmed value), but we do something wrong because when we
read the timer state, we see the *new* programmed value and so we
think that it's the new timer that triggered.

I dunno. I don't see why we'd lock up, but DaveJ's old lockup had
several signs that it seemed to be timer-related.

It would be interesting to see the actual irq number. Maybe this has
nothing what-so-ever to do with the hrtimer.

                          Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ