linux-kernel - Re: Bug 4.1.16: self-detected stall in net/unix/?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56BC90E7.7040007@pmhahn.de>
Date:	Thu, 11 Feb 2016 14:47:19 +0100
From:	Philipp Hahn <pmhahn@...ahn.de>
To:	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	Rainer Weikusat <rweikusat@...ileactivedefense.com>,
	"David S. Miller" <davem@...emloft.net>,
	linux-kernel@...r.kernel.org, Karolin Seeger <kseeger@...ba.org>,
	Jason Baron <jbaron@...mai.com>,
	Ben Hutchings <ben@...adent.org.uk>
Cc:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Arvid Requate <requate@...vention.de>,
	Stefan Gohmann <gohmann@...vention.de>
Subject: Re: Bug 4.1.16: self-detected stall in net/unix/?

Hi,

Am 05.02.2016 um 16:28 schrieb Philipp Hahn:
> Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
>> On 02.02.2016 17:25, Philipp Hahn wrote:
>>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>>> account for FDs passed over unix sockets" and have since then
>>> self-detected stalls triggered by the Samba daemon:
> ...
>>> We have not yet been able to reproduce the hang, but going back to our
>>> previous kernel 4.1.12 makes the problem go away.
>>
>> Can you remove the patch "unix: properly account for FDs passed over
>> unix sockets" and see if the problem still happens?
> 
> I will try.
> The problem is that I can't trigger the bug reliably. It always happens
> to "smbd", but I don't know the triggering condition.

Probably the same bug was also reported to samba-technical by Karolin
Seeger; she filed the bug for 3.19-ckt with Ubuntu:

<https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980>

Running the Samba test suite reproduces the problem; see bug for details.


> I will for now build a new kernel with
>> $ git log --oneline  v4.1.12..v4.1.17 -- net/unix
>> dc6b0ec unix: properly account for FDs passed over unix sockets
>> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
>> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
> reverted to see if it still happens. The "middle" patch seems harmless,
> as it only changes a code path for STREAMS, while the bug triggers with
> DGRAMS only.
> 
>> The stack trace is rather unreliable, maybe something completely
>> different happend. Do you happend to see better reports?
> 
> So far they look all the same.
> Anything more I can do to prepare for collection better information next
> time I get that bug?

I've enabled more Kernel debug options and got the following:

> [  598.482787] 
> [  598.492559] =====================================
> [  598.502646] [ BUG: bad unlock balance detected! ]
> [  598.512874] 4.1.16+ #24 Not tainted
> [  598.523134] -------------------------------------
> [  598.533592] smbd/8659 is trying to release lock (&(&u->lock)->rlock) at:
> [  598.544429] [<ffffffff815d1319>] spin_unlock+0x9/0x10
> [  598.555148] but there are no more locks to release!
> [  598.565892] 
> [  598.565892] other info that might help us debug this:
> [  598.586936] no locks held by smbd/8659.
> [  598.597478] 
> [  598.597478] stack backtrace:
> [  598.618275] CPU: 3 PID: 8659 Comm: smbd Not tainted 4.1.16+ #24
> [  598.628820] Hardware name: System manufacturer System Product Name/P7F-X Series, BIOS 0703    09/24/2010
> [  598.650020]  ffffffff815d1319 ffff8800b8efbb88 ffffffff8163ee73 0000000000000000
> [  598.661051]  ffff880034fc4110 ffff8800b8efbbb8 ffffffff810db540 ffff880034fc4110
> [  598.671990]  ffff880034fc4110 ffff88023206bd40 ffffffff815d1319 ffff8800b8efbc08
> [  598.682736] Call Trace:
> [  598.693187]  [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
> [  598.703798]  [<ffffffff8163ee73>] dump_stack+0x4c/0x65
> [  598.714223]  [<ffffffff810db540>] print_unlock_imbalance_bug+0x100/0x110
> [  598.724611]  [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
> [  598.734763]  [<ffffffff810e0d8e>] lock_release+0x2be/0x430
> [  598.744636]  [<ffffffff81648303>] _raw_spin_unlock+0x23/0x40
> [  598.754230]  [<ffffffff815d41a8>] ? unix_dgram_sendmsg+0x288/0x6f0
> [  598.763840]  [<ffffffff815d1319>] spin_unlock+0x9/0x10
> [  598.773126]  [<ffffffff815d41e7>] unix_dgram_sendmsg+0x2c7/0x6f0
> [  598.782209]  [<ffffffff814f6c9d>] sock_sendmsg+0x4d/0x60
> [  598.791313]  [<ffffffff814f7c3b>] ___sys_sendmsg+0x2db/0x2f0
> [  598.800369]  [<ffffffff812083c8>] ? kmem_cache_free+0x328/0x360
> [  598.809383]  [<ffffffff8127c1c0>] ? locks_free_lock+0x50/0x60
> [  598.818157]  [<ffffffff814f8649>] __sys_sendmsg+0x49/0x90
> [  598.826742]  [<ffffffff814f86a2>] SyS_sendmsg+0x12/0x20
> [  598.835110]  [<ffffffff816486f2>] system_call_fastpath+0x16/0x7a
> [  598.843546] ------------[ cut here ]------------
> [  598.851999] WARNING: CPU: 3 PID: 8659 at net/core/skbuff.c:691 skb_release_head_state+0xaa/0xb0()

I continued bisecting "v4.1.12..v4.1.17 -- net/unix/" and found:

Reverting the attached patch in 4.1 fixes my problem (for now). The
original patch went into 4.4, but was back-ported to several stable trees:

v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v4.4: 7d267278a9ece963d77eefec61630223fce08c6c

Philipp

View attachment "0001-unix-avoid-use-after-free-in-ep_remove_wait_queue.patch" of type "text/x-diff" (10381 bytes)