[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56BC90E7.7040007@pmhahn.de>
Date: Thu, 11 Feb 2016 14:47:19 +0100
From: Philipp Hahn <pmhahn@...ahn.de>
To: Hannes Frederic Sowa <hannes@...essinduktion.org>,
Sasha Levin <sasha.levin@...cle.com>,
Rainer Weikusat <rweikusat@...ileactivedefense.com>,
"David S. Miller" <davem@...emloft.net>,
linux-kernel@...r.kernel.org, Karolin Seeger <kseeger@...ba.org>,
Jason Baron <jbaron@...mai.com>,
Ben Hutchings <ben@...adent.org.uk>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arvid Requate <requate@...vention.de>,
Stefan Gohmann <gohmann@...vention.de>
Subject: Re: Bug 4.1.16: self-detected stall in net/unix/?
Hi,
Am 05.02.2016 um 16:28 schrieb Philipp Hahn:
> Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
>> On 02.02.2016 17:25, Philipp Hahn wrote:
>>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>>> account for FDs passed over unix sockets" and have since then
>>> self-detected stalls triggered by the Samba daemon:
> ...
>>> We have not yet been able to reproduce the hang, but going back to our
>>> previous kernel 4.1.12 makes the problem go away.
>>
>> Can you remove the patch "unix: properly account for FDs passed over
>> unix sockets" and see if the problem still happens?
>
> I will try.
> The problem is that I can't trigger the bug reliably. It always happens
> to "smbd", but I don't know the triggering condition.
Probably the same bug was also reported to samba-technical by Karolin
Seeger; she filed the bug for 3.19-ckt with Ubuntu:
<https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980>
Running the Samba test suite reproduces the problem; see bug for details.
> I will for now build a new kernel with
>> $ git log --oneline v4.1.12..v4.1.17 -- net/unix
>> dc6b0ec unix: properly account for FDs passed over unix sockets
>> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
>> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
> reverted to see if it still happens. The "middle" patch seems harmless,
> as it only changes a code path for STREAMS, while the bug triggers with
> DGRAMS only.
>
>> The stack trace is rather unreliable, maybe something completely
>> different happend. Do you happend to see better reports?
>
> So far they look all the same.
> Anything more I can do to prepare for collection better information next
> time I get that bug?
I've enabled more Kernel debug options and got the following:
> [ 598.482787]
> [ 598.492559] =====================================
> [ 598.502646] [ BUG: bad unlock balance detected! ]
> [ 598.512874] 4.1.16+ #24 Not tainted
> [ 598.523134] -------------------------------------
> [ 598.533592] smbd/8659 is trying to release lock (&(&u->lock)->rlock) at:
> [ 598.544429] [<ffffffff815d1319>] spin_unlock+0x9/0x10
> [ 598.555148] but there are no more locks to release!
> [ 598.565892]
> [ 598.565892] other info that might help us debug this:
> [ 598.586936] no locks held by smbd/8659.
> [ 598.597478]
> [ 598.597478] stack backtrace:
> [ 598.618275] CPU: 3 PID: 8659 Comm: smbd Not tainted 4.1.16+ #24
> [ 598.628820] Hardware name: System manufacturer System Product Name/P7F-X Series, BIOS 0703 09/24/2010
> [ 598.650020] ffffffff815d1319 ffff8800b8efbb88 ffffffff8163ee73 0000000000000000
> [ 598.661051] ffff880034fc4110 ffff8800b8efbbb8 ffffffff810db540 ffff880034fc4110
> [ 598.671990] ffff880034fc4110 ffff88023206bd40 ffffffff815d1319 ffff8800b8efbc08
> [ 598.682736] Call Trace:
> [ 598.693187] [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
> [ 598.703798] [<ffffffff8163ee73>] dump_stack+0x4c/0x65
> [ 598.714223] [<ffffffff810db540>] print_unlock_imbalance_bug+0x100/0x110
> [ 598.724611] [<ffffffff815d1319>] ? spin_unlock+0x9/0x10
> [ 598.734763] [<ffffffff810e0d8e>] lock_release+0x2be/0x430
> [ 598.744636] [<ffffffff81648303>] _raw_spin_unlock+0x23/0x40
> [ 598.754230] [<ffffffff815d41a8>] ? unix_dgram_sendmsg+0x288/0x6f0
> [ 598.763840] [<ffffffff815d1319>] spin_unlock+0x9/0x10
> [ 598.773126] [<ffffffff815d41e7>] unix_dgram_sendmsg+0x2c7/0x6f0
> [ 598.782209] [<ffffffff814f6c9d>] sock_sendmsg+0x4d/0x60
> [ 598.791313] [<ffffffff814f7c3b>] ___sys_sendmsg+0x2db/0x2f0
> [ 598.800369] [<ffffffff812083c8>] ? kmem_cache_free+0x328/0x360
> [ 598.809383] [<ffffffff8127c1c0>] ? locks_free_lock+0x50/0x60
> [ 598.818157] [<ffffffff814f8649>] __sys_sendmsg+0x49/0x90
> [ 598.826742] [<ffffffff814f86a2>] SyS_sendmsg+0x12/0x20
> [ 598.835110] [<ffffffff816486f2>] system_call_fastpath+0x16/0x7a
> [ 598.843546] ------------[ cut here ]------------
> [ 598.851999] WARNING: CPU: 3 PID: 8659 at net/core/skbuff.c:691 skb_release_head_state+0xaa/0xb0()
I continued bisecting "v4.1.12..v4.1.17 -- net/unix/" and found:
Reverting the attached patch in 4.1 fixes my problem (for now). The
original patch went into 4.4, but was back-ported to several stable trees:
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v4.4: 7d267278a9ece963d77eefec61630223fce08c6c
Philipp
View attachment "0001-unix-avoid-use-after-free-in-ep_remove_wait_queue.patch" of type "text/x-diff" (10381 bytes)
Powered by blists - more mailing lists