[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1326561043.5287.24.camel@edumazet-laptop>
Date:	Sat, 14 Jan 2012 18:10:43 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Sasha Levin <levinsasha928@...il.com>
Cc:	Dave Jones <davej@...hat.com>, davem <davem@...emloft.net>,
	Pekka Enberg <penberg@...nel.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Matt Mackall <mpm@...enic.com>, kaber@...sh.net,
	pablo@...filter.org, linux-kernel <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>, netfilter-devel@...r.kernel.org,
	netdev <netdev@...r.kernel.org>
Subject: Re: Hung task when calling clone() due to netfilter/slab
Le samedi 14 janvier 2012 à 18:30 +0200, Sasha Levin a écrit :
> Hi All,
> 
> I've stumbled on the following oops when testing the trinity fuzzer using KVM tool, running on the latest -next kernel:
> 
> [ 3960.845373] INFO: task trinity:31661 blocked for more than 120 seconds.
> [ 3960.846757] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 3960.850245] trinity         D ffff880008e89610  4952 31661   2197 0x00000004
> [ 3960.855032]  ffff880008e39830 0000000000000086 ffff880008e39fd8 ffff880008e89610
> [ 3960.859289]  00000000001d3980 ffff880008e39fd8 ffff880008e38000 00000000001d3980
> [ 3960.862723]  00000000001d3980 00000000001d3980 ffff880008e39fd8 00000000001d3980
> [ 3960.866083] Call Trace:
> [ 3960.867332]  [<ffffffff825830ea>] schedule+0x3a/0x50
> [ 3960.869477]  [<ffffffff82581285>] schedule_timeout+0x245/0x2c0
> [ 3960.871996]  [<ffffffff811023ee>] ? mark_held_locks+0x6e/0x130
> [ 3960.875599]  [<ffffffff810ffb62>] ? lock_release_holdtime+0xb2/0x160
> [ 3960.879156]  [<ffffffff8258450b>] ? _raw_spin_unlock_irq+0x2b/0x70
> [ 3960.881682]  [<ffffffff810de041>] ? get_parent_ip+0x11/0x50
> [ 3960.883907]  [<ffffffff82583880>] wait_for_common+0x120/0x170
> [ 3960.886190]  [<ffffffff810dc930>] ? try_to_wake_up+0x320/0x320
> [ 3960.888570]  [<ffffffff82583978>] wait_for_completion+0x18/0x20
> [ 3960.890968]  [<ffffffff810c5595>] call_usermodehelper_exec+0x1b5/0x1d0
> [ 3960.893583]  [<ffffffff825837a4>] ? wait_for_common+0x44/0x170
> [ 3960.895907]  [<ffffffff8182aa58>] kobject_uevent_env+0x4e8/0x580
> [ 3960.898310]  [<ffffffff8182aafb>] kobject_uevent+0xb/0x10
> [ 3960.900577]  [<ffffffff811b59a1>] sysfs_slab_add+0xb1/0x210
> [ 3960.902830]  [<ffffffff811b74db>] kmem_cache_create+0xcb/0x2f0
> [ 3960.905704]  [<ffffffff8216c72d>] nf_conntrack_init+0x9d/0x380
> [ 3960.908797]  [<ffffffff8216d11f>] nf_conntrack_net_init+0xf/0x1a0
> [ 3960.911065]  [<ffffffff821195a2>] ops_init+0x42/0x180
> [ 3960.913192]  [<ffffffff8211974b>] setup_net+0x6b/0x100
> [ 3960.915308]  [<ffffffff82119ba6>] copy_net_ns+0x86/0x110
> [ 3960.917534]  [<ffffffff810d4049>] create_new_namespaces+0xd9/0x190
> [ 3960.920048]  [<ffffffff810d4224>] copy_namespaces+0x84/0xc0
> [ 3960.922326]  [<ffffffff810aa471>] copy_process+0xa21/0x1480
> [ 3960.924328]  [<ffffffff810d3d9e>] ? up_read+0x1e/0x40
> [ 3960.925327]  [<ffffffff810aaf83>] do_fork+0x73/0x340
> [ 3960.926273]  [<ffffffff811026dd>] ? trace_hardirqs_on+0xd/0x10
> [ 3960.927372]  [<ffffffff82581ea9>] ? mutex_unlock+0x9/0x10
> [ 3960.928422]  [<ffffffff812c4d13>] ? ext4_sync_file+0xa3/0x340
> [ 3960.929487]  [<ffffffff8258525d>] ? retint_swapgs+0x13/0x1b
> [ 3960.930520]  [<ffffffff810554b3>] sys_clone+0x23/0x30
> [ 3960.931475]  [<ffffffff82585ec3>] stub_clone+0x13/0x20
> [ 3960.932490]  [<ffffffff82585b39>] ? system_call_fastpath+0x16/0x1b
> [ 3960.933655] 2 locks held by trinity/31661:
> [ 3960.934413]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff82119b9e>] copy_net_ns+0x7e/0x110
> [ 3960.936057]  #1:  (slub_lock){+.+.+.}, at: [<ffffffff811b7451>] kmem_cache_create+0x41/0x2f0
> [ 3960.937810] Kernel panic - not syncing: hung_task: blocked tasks
> 
> I had two trinity processes running, and both were stuck at clone() syscalls:
> 
> clone(clone_flags=0xcc2820ff, newsp=0xf3f270[page_0xff], parent_tid=0xf3f270[page_0xff], child_tid=0x400000, regs=0xf41290[page_allocs])
> clone(clone_flags=0x45706000, newsp=0xf3f270[page_0xff], parent_tid=0x7f9bf26f0000, child_tid=0xf3f270[page_0xff], regs=0xf41290[page_allocs])
> 
> This is the second time I got this oops, and both times it originated with clone calling up to the netfilter connection tracker code, so I don't think that it's coincidental that thats the origin.
> 
Apparently SLUB calls sysfs_slab_add() from kmem_cache_create() while
still holding slub_lock.
So if the task launched needs to "cat /proc/slabinfo" or anything
needing slub_lock, its a deadlock.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists
 
