linux-kernel - Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1367345295.30667.68.camel@gandalf.local.home>
Date:	Tue, 30 Apr 2013 14:08:15 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	Clark Williams <williams@...hat.com>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)

On Tue, 2013-04-30 at 19:09 +0200, Sebastian Andrzej Siewior wrote:

> The next thing that happens is that RCU assumes nobody is doing any
> progress (for almost 28secs) and triggers NMIs & printks to get some
> attention. I have a trace where
> - CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
>         has "lock" and is spinning for logbuf_lock
> 
> - CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
>   arch_trigger_all_cpu_backtrace_handler()
>         it may have logbuf_lock and is spinning for "lock"
> 
> I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
> it made no progress until I ended it.
> This NMI releated deadlock is a problem which should also trigger
> mainline, right?

Well, yeah, as sending out a NMI stack dump is sorta the last resort,
and is dangerous to do printks from NMI context.

> 
> Now, the time jump on the other hand is the real issue here and is
> RT-only. It looks like we get a big number of timer updates via
> tick_do_update_jiffies64() because according to ktime_get() that much
> time really passed by.

As the NMI dump only happens because of the time jump, which as you
said, is -rt only, I wouldn't say that the NMI deadlock is a mainline
bug.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/