[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1264663210.2793.110.camel@tonnant>
Date: Thu, 28 Jan 2010 02:20:10 -0500
From: Jon Masters <jonathan@...masters.org>
To: linux-kernel <linux-kernel@...r.kernel.org>
Cc: netdev <netdev@...r.kernel.org>, netfilter-devel@...r.kernel.org
Subject: Re: PROBLEM: reproducible crash KVM+nf_conntrack all recent 2.6
kernels
On Thu, 2010-01-28 at 00:46 -0500, Jon Masters wrote:
> A number of people seem to have reported this crash in various forms,
> but I have yet to see a solution, and can reproduce on 2.6.33-rc5 this
> evening so I know it's still present in the latest upstream kernels too.
> Userspace is Fedora 12, and this happens on both all recent F12 kernels
> (sporadic in 2.6.31 until recently, solidly reproducible on 2.6.32) and
> upstream 2.6.32, and 2.6.33-rc5 also - hard to find a "known good".
>
> The problem happens when using netfilter with KVM (problem does not
> occur without the firewall loaded, for example) and will occur within a
> few minutes of attempting to start or stop a guest that is connecting to
> the network - the easiest way to reproduce so far is simply to start up
> a bunch of Fedora guests and have them do a "yum update" cycle.
>
> All of the crashes appear similar to the following (2.6.33-rc5):
Rebuilt the kernel with all debug options turned on, got some lockdep
warnings (haven't looked further yet). Here's the output (attached full
boot log also):
[ 205.843071]
[ 205.843073] =======================================================
[ 205.843769] [ INFO: possible circular locking dependency detected ]
[ 205.843769] 2.6.33-rc5 #3
[ 205.843769] -------------------------------------------------------
[ 205.843769] libvirtd/3058 is trying to acquire lock:
[ 205.843769] (s_active){++++.+}, at: [<ffffffff81171a56>]
sysfs_addrm_finish+0x36/0x55
[ 205.843769]
[ 205.843769] but task is already holding lock:
[ 205.843769] (rtnl_mutex){+.+.+.}, at: [<ffffffff813cedd9>] rtnl_lock
+0x17/0x19
[ 205.843769]
[ 205.843769] which lock already depends on the new lock.
[ 205.843769]
[ 205.843769]
[ 205.843769] the existing dependency chain (in reverse order) is:
[ 205.843769]
[ 205.843769] -> #4 (rtnl_mutex){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff8145504b>] __mutex_lock_common
+0x4b/0x392
[ 205.843769] [<ffffffff81455456>] mutex_lock_nested+0x3e/0x43
[ 205.843769] [<ffffffff813cedd9>] rtnl_lock+0x17/0x19
[ 205.843769] [<ffffffff813d05c0>] linkwatch_event+0xe/0x2c
[ 205.843769] [<ffffffff8106713f>] worker_thread+0x266/0x35f
[ 205.843769] [<ffffffff8106b1d0>] kthread+0x9a/0xa2
[ 205.843769] [<ffffffff8100aaa4>] kernel_thread_helper+0x4/0x10
[ 205.843769]
[ 205.843769] -> #3 ((linkwatch_work).work){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81067136>] worker_thread+0x25d/0x35f
[ 205.843769] [<ffffffff8106b1d0>] kthread+0x9a/0xa2
[ 205.843769] [<ffffffff8100aaa4>] kernel_thread_helper+0x4/0x10
[ 205.843769]
[ 205.843769] -> #2 (events){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81067a8b>] flush_workqueue+0x63/0xb1
[ 205.843769] [<ffffffff81067aee>] flush_scheduled_work
+0x15/0x17
[ 205.843769] [<ffffffff8138168b>] do_md_stop+0x2c0/0x508
[ 205.843769] [<ffffffff81383b94>] md_ioctl+0x92f/0xf49
[ 205.843769] [<ffffffff8121442d>] __blkdev_driver_ioctl
+0x39/0xa3
[ 205.843769] [<ffffffff81214dc8>] blkdev_ioctl+0x67d/0x6b1
[ 205.843769] [<ffffffff8113ef34>] block_ioctl+0x37/0x3b
[ 205.843769] [<ffffffff81126294>] vfs_ioctl+0x32/0xa6
[ 205.843769] [<ffffffff81126814>] do_vfs_ioctl+0x490/0x4d6
[ 205.843769] [<ffffffff811268b0>] sys_ioctl+0x56/0x79
[ 205.843769] [<ffffffff81009c32>] system_call_fastpath
+0x16/0x1b
[ 205.843769]
[ 205.843769] -> #1 (&new->reconfig_mutex){+.+.+.}:
[ 205.843769] [<ffffffff8107ea94>] __lock_acquire+0xb84/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff8145504b>] __mutex_lock_common
+0x4b/0x392
[ 205.843769] [<ffffffff814553d0>]
mutex_lock_interruptible_nested+0x3e/0x43
[ 205.843769] [<ffffffff8137bd62>] mddev_lock+0x17/0x19
[ 205.843769] [<ffffffff8137c039>] md_attr_show+0x32/0x5d
[ 205.843769] [<ffffffff811708dc>] sysfs_read_file+0xbd/0x17f
[ 205.843769] [<ffffffff8111983d>] vfs_read+0xab/0x108
[ 205.843769] [<ffffffff8111995a>] sys_read+0x4a/0x6e
[ 205.843769] [<ffffffff81009c32>] system_call_fastpath
+0x16/0x1b
[ 205.843769]
[ 205.843769] -> #0 (s_active){++++.+}:
[ 205.843769] [<ffffffff8107e93e>] __lock_acquire+0xa2e/0xd3c
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81171463>] sysfs_deactivate+0x9a/0x103
[ 205.843769] [<ffffffff81171a56>] sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff81171aab>] remove_dir+0x36/0x3e
[ 205.843769] [<ffffffff81171b60>] sysfs_remove_dir+0x9d/0xbe
[ 205.843769] [<ffffffff812206c1>] kobject_del+0x16/0x37
[ 205.843769] [<ffffffff8122078a>] kobject_release+0xa8/0xf8
[ 205.843769] [<ffffffff8122165d>] kref_put+0x43/0x4d
[ 205.843769] [<ffffffff8122063c>] kobject_put+0x47/0x4b
[ 205.843769] [<ffffffffa02d5ef5>] br_sysfs_delbr+0x21/0x43
[bridge]
[ 205.843769] [<ffffffffa02d20e9>] del_br+0x5a/0x6f [bridge]
[ 205.843769] [<ffffffffa02d24b2>] br_del_bridge+0x59/0x67
[bridge]
[ 205.843769] [<ffffffffa02d3764>] br_ioctl_deviceless_stub
+0x1fb/0x21b [bridge]
[ 205.843769] [<ffffffff813b32c4>] sock_ioctl+0x141/0x20d
[ 205.843769] [<ffffffff81126294>] vfs_ioctl+0x32/0xa6
[ 205.843769] [<ffffffff81126814>] do_vfs_ioctl+0x490/0x4d6
[ 205.843769] [<ffffffff811268b0>] sys_ioctl+0x56/0x79
[ 205.843769] [<ffffffff81009c32>] system_call_fastpath
+0x16/0x1b
[ 205.843769]
[ 205.843769] other info that might help us debug this:
[ 205.843769]
[ 205.843769] 2 locks held by libvirtd/3058:
[ 205.843769] #0: (br_ioctl_mutex){+.+.+.}, at: [<ffffffff813b32a9>]
sock_ioctl+0x126/0x20d
[ 205.843769] #1: (rtnl_mutex){+.+.+.}, at: [<ffffffff813cedd9>]
rtnl_lock+0x17/0x19
[ 205.843769]
[ 205.843769] stack backtrace:
[ 205.843769] Pid: 3058, comm: libvirtd Not tainted 2.6.33-rc5 #3
[ 205.843769] Call Trace:
[ 205.843769] [<ffffffff8107dae0>] print_circular_bug+0xa8/0xb6
[ 205.843769] [<ffffffff8107e93e>] __lock_acquire+0xa2e/0xd3c
[ 205.843769] [<ffffffff81171a56>] ? sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff8107ed19>] lock_acquire+0xcd/0xf1
[ 205.843769] [<ffffffff81171a56>] ? sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff81171463>] sysfs_deactivate+0x9a/0x103
[ 205.843769] [<ffffffff81171a56>] ? sysfs_addrm_finish+0x36/0x55
[ 205.843769] [<ffffffff8107c028>] ? trace_hardirqs_off+0xd/0xf
[ 206.367937] [<ffffffff81454e6a>] ? __mutex_unlock_slowpath
+0x120/0x132
[ 206.367937] [<ffffffff81171a56>] sysfs_addrm_finish+0x36/0x55
[ 206.367937] [<ffffffff81171aab>] remove_dir+0x36/0x3e
[ 206.367937] [<ffffffff81171b60>] sysfs_remove_dir+0x9d/0xbe
[ 206.367937] [<ffffffff812206c1>] kobject_del+0x16/0x37
[ 206.367937] [<ffffffff8122078a>] kobject_release+0xa8/0xf8
[ 206.367937] [<ffffffff812206e2>] ? kobject_release+0x0/0xf8
[ 206.367937] [<ffffffff8122165d>] kref_put+0x43/0x4d
[ 206.367937] [<ffffffff8122063c>] kobject_put+0x47/0x4b
[ 206.367937] [<ffffffff8105d218>] ? del_timer_sync+0x0/0xa6
[ 206.367937] [<ffffffffa02d5ef5>] br_sysfs_delbr+0x21/0x43 [bridge]
[ 206.367937] [<ffffffffa02d20e9>] del_br+0x5a/0x6f [bridge]
[ 206.367937] [<ffffffffa02d24b2>] br_del_bridge+0x59/0x67 [bridge]
[ 206.367937] [<ffffffffa02d3764>] br_ioctl_deviceless_stub
+0x1fb/0x21b [bridge]
[ 206.367937] [<ffffffff813b32c4>] sock_ioctl+0x141/0x20d
[ 206.367937] [<ffffffff81126294>] vfs_ioctl+0x32/0xa6
[ 206.367937] [<ffffffff81126814>] do_vfs_ioctl+0x490/0x4d6
[ 206.367937] [<ffffffff811268b0>] sys_ioctl+0x56/0x79
[ 206.367937] [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
[ 206.482734] kobject: 'brif' (ffff880213a243b8): calling ktype release
[ 206.492872] kobject: (ffff880213a243b8): dynamic_kobj_release
[ 206.498841] kobject: 'brif': free name
[ 206.527833] kobject: 'virbr0' (ffff8801cf3b0500): kobject_uevent_env
[ 206.534360] kobject: 'virbr0' (ffff8801cf3b0500): fill_kobj_path:
path = '/devices/virtual/net/virbr0'
[ 206.551468] kobject: 'virbr0' (ffff8801cf3b0500): kobject_cleanup
[ 206.557611] kobject: 'virbr0' (ffff8801cf3b0500): calling ktype
release
[ 206.564358] kobject: 'virbr0': free name
[ 208.575408] kobject: 'tun' (ffffffffa044b510): kobject_add_internal:
parent: 'module', set: 'module'
[ 208.595562] kobject: 'holders' (ffff8801d352cf68):
kobject_add_internal: parent: 'tun', set: '<NULL>'
[ 208.604990] kobject: 'tun' (ffffffffa044b510): kobject_uevent_env
[ 208.611296] kobject: 'tun' (ffffffffa044b510): fill_kobj_path: path =
'/module/tun'
[ 208.620119] kobject: 'notes' (ffff8801d352c660):
kobject_add_internal: parent: 'tun', set: '<NULL>'
[ 208.630343] tun: Universal TUN/TAP device driver, 1.6
[ 208.635421] tun: (C) 1999-2004 Max Krasnyansky <maxk@...lcomm.com>
[ 208.641721] kobject: 'tun' (ffff8801cd319e08): kobject_add_internal:
parent: 'misc', set: 'devices'
[ 208.651194] kobject: 'tun' (ffff8801cd319e08): kobject_uevent_env
[ 208.657386] kobject: 'tun' (ffff8801cd319e08): fill_kobj_path: path =
'/devices/virtual/misc/tun'
[ 208.671916] kobject: 'vnet0' (ffff880207df35e0):
kobject_add_internal: parent: 'net', set: 'devices'
[ 208.682667] kobject: 'vnet0' (ffff880207df35e0): kobject_uevent_env
[ 208.689435] kobject: 'vnet0' (ffff880207df35e0): fill_kobj_path: path
= '/devices/virtual/net/vnet0'
[ 208.712572] device vnet0 entered promiscuous mode
[ 208.717571] kobject: 'brport' (ffff8801cd378e88):
kobject_add_internal: parent: 'vnet0', set: '<NULL>'
[ 208.728149] kobject: 'brport' (ffff8801cd378e88): kobject_uevent_env
[ 208.734561] kobject: 'brport' (ffff8801cd378e88): kobject_uevent_env:
filter function caused the event to drop!
[ 208.748356] br0: port 2(vnet0) entering forwarding state
[ 209.576245] kobject: 'vnet1' (ffff8801cc9a0500):
kobject_add_internal: parent: 'net', set: 'devices'
[ 209.587167] kobject: 'vnet1' (ffff8801cc9a0500): kobject_uevent_env
[ 209.593773] kobject: 'vnet1' (ffff8801cc9a0500): fill_kobj_path: path
= '/devices/virtual/net/vnet1'
[ 209.614211] device vnet1 entered promiscuous mode
[ 209.619898] kobject: 'brport' (ffff8801cee35308):
kobject_add_internal: parent: 'vnet1', set: '<NULL>'
[ 209.631310] kobject: 'brport' (ffff8801cee35308): kobject_uevent_env
[ 209.638303] kobject: 'brport' (ffff8801cee35308): kobject_uevent_env:
filter function caused the event to drop!
[ 209.654971] br0: port 3(vnet1) entering forwarding state
[ 210.268995] SELinux: initialized (dev mqueue, type mqueue), uses
transition SIDs
[ 210.277526] kobject: 'pid_2' (ffff88020e222128):
kobject_add_internal: parent: 'slab', set: 'slab'
[ 210.288037] kobject: 'pid_2' (ffff88020e222128): kobject_uevent_env
[ 210.294431] kobject: 'pid_2' (ffff88020e222128): fill_kobj_path: path
= '/kernel/slab/pid_2'
[ 210.304019] SELinux: initialized (dev proc, type proc), uses
genfs_contexts
[ 210.397532] SELinux: initialized (dev mqueue, type mqueue), uses
transition SIDs
[ 210.434435] lo: Disabled Privacy Extensions
[ 210.443268] SELinux: initialized (dev proc, type proc), uses
genfs_contexts
[ 210.787652] kobject: 'lo' (ffff880207b725a0): kobject_cleanup
[ 210.793444] kobject: 'lo' (ffff880207b725a0): calling ktype release
[ 210.799783] kobject: 'lo': free name
[ 218.966215] vnet0: no IPv6 routers present
[ 220.336219] vnet1: no IPv6 routers present
[ 238.799662] kobject: 'vnet2' (ffff8801ccba8500):
kobject_add_internal: parent: 'net', set: 'devices'
[ 238.811833] kobject: 'vnet2' (ffff8801ccba8500): kobject_uevent_env
[ 238.818841] kobject: 'vnet2' (ffff8801ccba8500): fill_kobj_path: path
= '/devices/virtual/net/vnet2'
[ 238.841088] device vnet2 entered promiscuous mode
[ 238.846209] kobject: 'brport' (ffff8801cf2d92d0):
kobject_add_internal: parent: 'vnet2', set: '<NULL>'
[ 238.856592] kobject: 'brport' (ffff8801cf2d92d0): kobject_uevent_env
[ 238.863088] kobject: 'brport' (ffff8801cf2d92d0): kobject_uevent_env:
filter function caused the event to drop!
[ 238.878798] br0: port 4(vnet2) entering forwarding state
[ 249.395339] vnet2: no IPv6 routers present
[ 339.729225] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 339.730086] last sysfs
file: /sys/devices/virtual/block/md0/md/sync_speed
[ 339.730086] CPU 7
[ 339.730086] Pid: 3245, comm: qemu-kvm Not tainted 2.6.33-rc5 #3
0F9382/Precision WorkStation 490
[ 339.730086] RIP: 0010:[<ffffffff813e5f3e>] [<ffffffff813e5f3e>]
nf_ct_remove_expectations+0x49/0x5c
[ 339.730086] RSP: 0018:ffff880030603c18 EFLAGS: 00010206
[ 339.730086] RAX: ffff8801c2ea530b RBX: 009ab2000008d300 RCX:
ffffffffa02925c1
[ 339.730086] RDX: 000000000000006b RSI: ffff880030603be8 RDI:
009ab2000008d378
[ 339.730086] RBP: ffff880030603c28 R08: 0000000000000002 R09:
0000000000000000
[ 339.730086] R10: 0000000000000003 R11: ffff88019d28a268 R12:
ffffffff829a0400
[ 339.730086] R13: ffffffffa02925c1 R14: 0000000000000000 R15:
ffff8801a18e4630
[ 339.730086] FS: 00007fb6197ca780(0000) GS:ffff880030600000(0000)
knlGS:0000000000000000
[ 339.730086] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 339.730086] CR2: 00007f747a184178 CR3: 00000001badd8000 CR4:
00000000000026e0
[ 339.730086] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 339.730086] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 339.730086] Process qemu-kvm (pid: 3245, threadinfo ffff8801b9370000,
task ffff8801d53c48a0)
[ 339.730086] Stack:
[ 339.730086] ffffffff81b045d0 ffffffff829a0400 ffff880030603c48
ffffffff813e3426
[ 339.730086] <0> ffff8801a18e4600 ffff88019d28a250 ffff880030603c68
ffffffff813e10f8
[ 339.730086] <0> ffff880030603c68 ffffffff81b045d0 ffff880030603c88
ffffffff813bc080
[ 339.730086] Call Trace:
[ 339.730086] <IRQ>
[ 339.730086] [<ffffffff813e3426>] destroy_conntrack+0x73/0x121
[ 339.730086] [<ffffffff813e10f8>] nf_conntrack_destroy+0x2c/0x33
[ 339.730086] [<ffffffff813bc080>] skb_release_head_state+0x77/0xb9
[ 339.730086] [<ffffffff813bbde2>] __kfree_skb+0x16/0x82
[ 339.730086] [<ffffffff813bbee7>] kfree_skb+0x6a/0x73
[ 339.730086] [<ffffffffa02925c1>] ip6_mc_input+0x214/0x221 [ipv6]
[ 339.730086] [<ffffffffa02925f5>] ip6_rcv_finish+0x27/0x2b [ipv6]
[ 339.730086] [<ffffffffa0292938>] ipv6_rcv+0x33f/0x37f [ipv6]
[ 339.730086] [<ffffffff813c416a>] netif_receive_skb+0x3b0/0x3da
[ 339.730086] [<ffffffffa02d2b6e>] br_handle_frame_finish+0x104/0x13c
[bridge]
[ 339.730086] [<ffffffffa02d2d37>] br_handle_frame+0x191/0x1aa
[bridge]
[ 339.730086] [<ffffffff813c40d6>] netif_receive_skb+0x31c/0x3da
[ 339.730086] [<ffffffff8106e6ee>] ? __run_hrtimer+0xec/0x11b
[ 339.730086] [<ffffffff813c4230>] process_backlog+0x9c/0xd8
[ 339.730086] [<ffffffff813c489b>] net_rx_action+0x7d/0x19c
[ 339.730086] [<ffffffff81055d08>] __do_softirq+0xf8/0x1cd
[ 339.730086] [<ffffffff8100ab9c>] call_softirq+0x1c/0x30
[ 339.730086] <EOI>
[ 339.730086] [<ffffffff8100c38f>] ? do_softirq+0x4b/0xa3
[ 339.730086] [<ffffffff813c49e0>] netif_rx_ni+0x26/0x2b
[ 339.730086] [<ffffffffa0449d6d>] tun_chr_aio_write+0x3ce/0x429 [tun]
[ 339.730086] [<ffffffffa044999f>] ? tun_chr_aio_write+0x0/0x429 [tun]
[ 339.730086] [<ffffffff81118fd2>] do_sync_readv_writev+0xc1/0x100
[ 339.730086] [<ffffffff811e9ee1>] ? selinux_file_permission+0xa7/0xb3
[ 339.730086] [<ffffffff81118d36>] ? copy_from_user+0x2f/0x31
[ 339.730086] [<ffffffff811e02f5>] ? security_file_permission
+0x16/0x18
[ 339.730086] [<ffffffff81119d1f>] do_readv_writev+0xa7/0x127
[ 339.730086] [<ffffffff8111a1d7>] ? rcu_read_unlock+0x21/0x23
[ 339.730086] [<ffffffff8111a2bd>] ? fget_light+0xe4/0xf2
[ 339.730086] [<ffffffff8111a230>] ? fget_light+0x57/0xf2
[ 339.730086] [<ffffffff81119de2>] vfs_writev+0x43/0x4e
[ 339.730086] [<ffffffff81119ed2>] sys_writev+0x4a/0x93
[ 339.730086] [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
[ 339.730086] Code: 38 0f b6 d2 48 01 d0 74 30 48 8b 58 28 eb 13 48 89
df e8 fc f8 ff ff 48 89 df e8 c8 f8 ff ff 4c 89 e3 48 85 db 74 12 48 8d
7b 78 <4c> 8b 23 e8 e2 73 c7 ff 85 c0 74 e8 eb d6 5b 41 5c c9 c3 55 48
[ 339.730086] RIP [<ffffffff813e5f3e>] nf_ct_remove_expectations
+0x49/0x5c
[ 339.730086] RSP <ffff880030603c18>
[ 340.095082] ---[ end trace 7be6d51534d25d87 ]---
[ 340.099728] Kernel panic - not syncing: Fatal exception in interrupt
[ 340.106123] Pid: 3245, comm: qemu-kvm Tainted: G D 2.6.33-rc5
#3
[ 340.112853] Call Trace:
[ 340.112860] <IRQ> [<ffffffff81453b1d>] panic+0x7a/0x142
[ 340.112886] [<ffffffff81458011>] oops_end+0xb7/0xc7
[ 340.112898] [<ffffffff8100d421>] die+0x5a/0x63
[ 340.112909] [<ffffffff81457bcc>] do_general_protection+0x12a/0x133
[ 340.112922] [<ffffffff81457100>] ? irq_return+0x0/0x10
[ 340.112956] [<ffffffffa02925c1>] ? ip6_mc_input+0x214/0x221 [ipv6]
[ 340.112969] [<ffffffff81457345>] general_protection+0x25/0x30
[ 340.113002] [<ffffffffa02925c1>] ? ip6_mc_input+0x214/0x221 [ipv6]
Jon.
View attachment "kvm_crash_20100128.txt" of type "text/plain" (80209 bytes)
Powered by blists - more mailing lists