lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 19 Mar 2014 22:56:59 -0700
From:	Davidlohr Bueso <davidlohr@...com>
To:	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	torvalds@...ux-foundation.org, tglx@...utronix.de,
	mingo@...nel.org, LKML <linux-kernel@...r.kernel.org>,
	linuxppc-dev@...ts.ozlabs.org, benh@...nel.crashing.org,
	paulus@...ba.org, Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: Tasks stuck in futex code (in 3.14-rc6)

On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck
> 
> # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> 14680 pts/0    root     java         - 0 futex_wait_queue_me
> 14797 pts/0    root     java         - 0 futex_wait_queue_me
> # :> /var/log/messages
> # echo t > /proc/sysrq-trigger 
> # grep futex_wait_queue_me /var/log/messages | wc -l 
> 334
> #
> 
> [ 6904.211478] Call Trace:
> [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> 
> Is there any other information that I provide that can help?

This problem suggests that we missed a wakeup for a task that was adding
itself to the queue in a wait path. And the only place that can happen
is with the hb spinlock check for any pending waiters. Just in case we
missed some assumption about checking the hash bucket spinlock as a way
of detecting any waiters (powerpc?), could you revert this commit and
try the original atomic operations variant:

https://lkml.org/lkml/2013/12/19/630

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists