lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFym2UfWnXZw0NjA70Q575eybiAOUkx==3Ci+V43u1-ZNQ@mail.gmail.com>
Date:	Wed, 19 Nov 2014 09:40:26 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dave Jones <davej@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Don Zickus <dzickus@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Andy Lutomirski <luto@...capital.net>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Nov 19, 2014 at 9:22 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> So it hasn't actually done the "push %rbx; popfq" part - there must be
> a label at the return part, and context_tracking_user_exit() never
> actually did the local_irq_save/restore at all. Which means that it
> took one of the early exits instead:
>
>         if (!context_tracking_is_enabled())
>                 return;
>
>         if (in_interrupt())
>                 return;

Ho humm. Interesting. Neither of those should possibly have happened.

We "know" that "context_tracking_is_enabled()" must be true, because
the only way we get to context_tracking_user_exit() in the first place
is through "user_exit()", which does:

        if (context_tracking_is_enabled())
                context_tracking_user_exit();

and we know we shouldn't be in_interrupt(), because the backtrace is
the system call entry path, for chrissake!

So we definitely have some corruption going on. A few possibilities:

 - either the register contents are corrupted (%rbx in your dump said
"0x0000000100000046", but the eflags we restored was 0x246)

 - in_interrupt() is wrong, and we've had some irq_count() corruption.
I'd expect that to result in "scheduling while atomic" messages,
though, especially if it goes on long enough that you get a watchdog
event..

 - there is something rotten in the land of
context_tracking_is_enabled(), which uses a static key.

 - I have misread the whole trace, and am a moron. But your earlier
report really had some very similar things, just in
context_tracking_user_enter() instead of exit.

In your previous oops, the registers that was allegedly used to
restore %eflags was %r12:

  28: 41 54                 push   %r12
  2a: 9d                   popfq
  2b:* 5b                   pop    %rbx <-- trapping instruction
  2c: 41 5c                 pop    %r12
  2e: 5d                   pop    %rbp
  2f: c3                   retq

but:

  R12: ffff880101ee3ec0
  EFLAGS: 00000282

so again, it looks like we never actually did that "popfq"
instruction, and it would have exited through the (same) early exits.

But what an odd coincidence that it ended up in both of your reports
being *exactly* at that instruction after the "popf". If it had
actually *taken* the popf, I'd not be so surprised ("ok, popf enabled
interrupts, and there was an interrupt pending"), but since everything
seems to say that it came there through some control flow that did
*not* go through the popf, that's just a very odd coincidence.

And both context_tracking_user_enter() and exit() have that exact same
issue with the early returns. They shouldn't have happened in the
first place.

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ