lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyRdYDzj_d42paCy-Tg33ysEmkTVVZ+A97b+-0LF_6Yeg@mail.gmail.com>
Date:	Wed, 19 Nov 2014 11:38:09 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Dave Jones <davej@...hat.com>, Don Zickus <dzickus@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Nov 19, 2014 at 11:15 AM, Andy Lutomirski <luto@...capital.net> wrote:
>
> I suspect that the regression was triggered by the seccomp pull, since
> that reworked a lot of this code.

Note that it turns out that Dave can apparently see the same problems
with 3.17, so it's not actually a regression. So it may have been
going on for a while.


> Just to make sure I understand: it says "NMI watchdog", but this trace
> is from a timer interrupt, not NMI, right?

Yeah. The kernel/watchdog.c code always says "NMI watchdog", but it's
actually just a regular tiemr function: watchdog_timer_fn() started
with hrtimer_start().

> Is it possible that we've managed to return to userspace with
> interrupts off somehow?  A loop in userspace that somehow has
> interrupts off can cause all kinds of fun lockups.

That sounds unlikely, but if there is some stack corruption going on.

However, it wouldn't even explain things, because even if interrupts
had been disabled in user space, and even if that popf got executed,
this wouldn't be where they got enabled. That would be the :"sti" in
the system call entry path (hidden behind the ENABLE_INTERRUPTS
macro).

Of course, maybe Dave has paravirtualization enabled (what a crock
_that_ is), and there is something wrong with that whole code.

> I don't understand the logic of what enables TIF_NOHZ.

Yeah, that makes two of us.  But..

> In 3.17, I don't think that code would run with context tracking on,
> although I don't immediately see any bugs here.

See above: the problem apparently isn't new. Although it is possible
that we have two different issues going on..

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ