lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFy8FeGMOnWwkfmyFYyVCw+k1LSVg7BG0tsTNZzqi5EKYg@mail.gmail.com>
Date:	Fri, 15 Feb 2013 10:34:13 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>
Cc:	Dave Jones <davej@...hat.com>, Hugh Dickins <hughd@...gle.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Paul McKenney <paul.mckenney@...aro.org>
Subject: Re: Debugging Thinkpad T430s occasional suspend failure.

On Fri, Feb 15, 2013 at 9:44 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
>
> This commit was designed to increase the probability of hitting the
> races described in http://lwn.net/Articles/453002/.  These races result
> in deadlocks involving the runqueue lock (and perhaps also the priority
> inheritance locks).  And yes, I most certainly should have described
> this in the commit message.  :-(

Ugh. That particular race seems to be because the softirq handling is
just crazy, and does the "wakeup_softirqd()" form interrupt context,
BUT HAS SPECIFICALLY BROKEN THE IRQ COUNTING!

Because it claims to do it from softirq context, which is pure
garbage. It's not actually in softirq context.

The whole hardirq -> softirq transition seems stupid. I'm sure I made
some serious mistake in cleaning it up, and there's probably some
missed tracepoint (or perhaps screwed-up lockdep annotation), but I
think the hardirq -> softirq preempt thing shoudl be done as an atomic
preempt downgrade, so that we never have a window of "uhhuh, another
interrupt can come in between and see us as being in neither). And the
wakeup_softirqd should be done without playing with preempt count at
all.

Something like this ENTIRELY UNTESTED patch.

Note: I doubt this patch affects Dave's issue at all, I just started
looking at that do_softirq code when I read your bug explanation.

Adding random people for kernel/softirq.c to the participants list.
Comments about the patch? Do note that it's entirely untested, so
consider it more a RFD than a real patch.. It looks like it adds a lot
of lines, but most of it is for comments and simplification of the
logic.

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ