linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141119230051.GB11386@lerouge>
Date:	Thu, 20 Nov 2014 00:00:53 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>, Don Zickus <dzickus@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Nov 19, 2014 at 11:03:48AM -0800, Andy Lutomirski wrote:
> On Wed, Nov 19, 2014 at 11:02 AM, Frederic Weisbecker
> <fweisbec@...il.com> wrote:
> > On Wed, Nov 19, 2014 at 09:40:26AM -0800, Linus Torvalds wrote:
> >> On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
> >> <torvalds@...ux-foundation.org> wrote:
> >> >
> >> > So it hasn't actually done the "push %rbx; popfq" part - there must be
> >> > a label at the return part, and context_tracking_user_exit() never
> >> > actually did the local_irq_save/restore at all. Which means that it
> >> > took one of the early exits instead:
> >> >
> >> >         if (!context_tracking_is_enabled())
> >> >                 return;
> >> >
> >> >         if (in_interrupt())
> >> >                 return;
> >>
> >> Ho humm. Interesting. Neither of those should possibly have happened.
> >>
> >> We "know" that "context_tracking_is_enabled()" must be true, because
> >> the only way we get to context_tracking_user_exit() in the first place
> >> is through "user_exit()", which does:
> >>
> >>         if (context_tracking_is_enabled())
> >>                 context_tracking_user_exit();
> >>
> >> and we know we shouldn't be in_interrupt(), because the backtrace is
> >> the system call entry path, for chrissake!
> >>
> >> So we definitely have some corruption going on. A few possibilities:
> >>
> >>  - either the register contents are corrupted (%rbx in your dump said
> >> "0x0000000100000046", but the eflags we restored was 0x246)
> >>
> >>  - in_interrupt() is wrong, and we've had some irq_count() corruption.
> >> I'd expect that to result in "scheduling while atomic" messages,
> >> though, especially if it goes on long enough that you get a watchdog
> >> event..
> >>
> >>  - there is something rotten in the land of
> >> context_tracking_is_enabled(), which uses a static key.
> >>
> >>  - I have misread the whole trace, and am a moron. But your earlier
> >> report really had some very similar things, just in
> >> context_tracking_user_enter() instead of exit.
> >>
> >> In your previous oops, the registers that was allegedly used to
> >> restore %eflags was %r12:
> >>
> >>   28: 41 54                 push   %r12
> >>   2a: 9d                   popfq
> >>   2b:* 5b                   pop    %rbx <-- trapping instruction
> >>   2c: 41 5c                 pop    %r12
> >>   2e: 5d                   pop    %rbp
> >>   2f: c3                   retq
> >>
> >> but:
> >>
> >>   R12: ffff880101ee3ec0
> >>   EFLAGS: 00000282
> >>
> >> so again, it looks like we never actually did that "popfq"
> >> instruction, and it would have exited through the (same) early exits.
> >>
> >> But what an odd coincidence that it ended up in both of your reports
> >> being *exactly* at that instruction after the "popf". If it had
> >> actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
> >> interrupts, and there was an interrupt pending"), but since everything
> >> seems to say that it came there through some control flow that did
> >> *not* go through the popf, that's just a very odd coincidence.
> >>
> >> And both context_tracking_user_enter() and exit() have that exact same
> >> issue with the early returns. They shouldn't have happened in the
> >> first place.
> >
> > I got a report lately involving context tracking. Not sure if it's
> > the same here but the issue was that context tracking uses per cpu data
> > and per cpu allocation use vmalloc and vmalloc'ed area can fault due to
> > lazy paging.
> 
> Wait, what?  If something like kernel_stack ends with an unmapped pmd,
> we are well and truly screwed.

Note that's non-sleeping faults. So probably most places are fine except
a few of them that really don't want exception to mess up some state. I
can imagine some entry code that really don't want that.

Is kernel stack allocated by vmalloc or alloc_percpu()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/