lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171012121658.187c5af6@gandalf.local.home>
Date:   Thu, 12 Oct 2017 12:16:58 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Subject: NMI watchdog dump does not print on hard lockup

While doing my presentation for ELC and OSS in Prague in a couple of
weeks, I notice an issue with the printk_safe logic. Then I wrote code
to see if my fears were justified.

I noticed that the NMI printks now depend on an irq_work to trigger to
flush the data out that was written by printks during the NMI. But if
the irq work can't trigger, nothing will get out of the screen.

To test this, I first added this:

	raw_spin_lock(&global_trace.start_lock);
	raw_spin_lock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);

To the write function of /sys/kernel/debug/tracing/free_buffer

That way I could trigger a lockup (this case a soft lockup) when I
wanted to.

 # echo 1 > /sys/kernel/debug/tracing/free_buffer

Sure enough, in a minute after doing this, the soft lockup
warning triggered.

Then I changed it to:

	raw_spin_lock_irq(&global_trace.start_lock);
	raw_spin_lock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);
	raw_spin_unlock_irq(&global_trace.start_lock);

And to my surprise, the hard lockup warning triggered. But then I
noticed that the lockup was detected from another CPU. So I changed
this to:

static void lock_up_cpu(void *data)
{
	unsigned long flags;
	raw_spin_lock_irqsave(&global_trace.start_lock, flags);
	raw_spin_lock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);
	raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
}

[..]

	on_each_cpu(lock_up_cpu, NULL, 1);

This too triggered the warning. But I noticed that the calling function
didn't hard lockup. (Not all CPUs were hard locked).

Finally I did:

	on_each_cpu(lock_up_cpu, NULL, 0);
	lock_up_cpu(tr);

And boom! It locked up (lockdep was enabled, so I could see it showing
the deadlock), but then it stopped there. No output. The NMI watchdog
will only detect hard lockups if there is at least one CPU that is
still active. This could be an issue on non SMP boxes.

We need a way to have NMI flush to consoles when a lockup is detected,
and not depend on an irq_work to do so.

I'll update my presentation to discuss this flaw ;-)

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ