lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1002110903080.25392@p34.internal.lan>
Date:	Thu, 11 Feb 2010 09:09:39 -0500 (EST)
From:	Justin Piszcz <jpiszcz@...idpixels.com>
To:	linux-kernel@...r.kernel.org, linux-raid@...r.kernl.org,
	xfs@....sgi.com, Alan Piszcz <ap@...arrain.com>
Subject: 2.6.32.3 x86_64 - XFS hangs, all I/O to D-state bug (again)

Hello,

While tarring and compressing (bzip2) a lot of files, the following error
occurred, note the output is not clean because this was taken from netconsole.

When this occurs, the host cannot be rebooted with reboot/proceses cannot
be killed and the box locks up.  There are no apparent hardware issues.

Before, asterisk would trigger this bug, since asterisk no longer runs on 
this host, it ran for ~2-3 months without any problems, until now.

Please cc me as I am not on the lists, thanks.

Is this a md raid issue or XFS? From the trace it appears to be an XFS bug?

Feb 11 07:38:14 l1 [   20.270270] e100: eth1 NIC Link is Up 100 Mbps Full Duplex 
Feb 11 07:47:54 l1 [  600.432165] INFO: task scp:4871 blocked for more than 120 seconds. 
Feb 11 07:47:54 l1 [  600.432177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Feb 11 07:47:54 l1 [  600.432182] scp           D
Feb 11 07:47:54 l1  ffff8801eee6f14c 
Feb 11 07:47:54 l1     0  4871   4870 0x00000000 
Feb 11 07:47:54 l1 [  600.432188]  ffff880220c87950
Feb 11 07:47:54 l1  0000000000000082
Feb 11 07:47:54 l1  0000000000200200
Feb 11 07:47:54 l1  ffff88023eb71400
Feb 11 07:47:54 l1 
Feb 11 07:47:54 l1 [  600.432196]  000000000000c928
Feb 11 07:47:54 l1  ffff8801f7a55fd8
Feb 11 07:47:54 l1  ffff880205fc6150
Feb 11 07:47:54 l1  ffff880205fc63c8
Feb 11 07:47:54 l1 
Feb 11 07:47:54 l1 [  600.432203]  0000000000001000
Feb 11 07:47:54 l1  ffffffff8108d02a
Feb 11 07:47:54 l1  ffff8801eee6f0c0
Feb 11 07:47:54 l1  ffff880205fc63c8
Feb 11 07:47:54 l1 
Feb 11 07:47:54 l1 [  600.432209] Call Trace: 
Feb 11 07:47:54 l1 [  600.432230]  [<ffffffff8108d02a>] ? generic_file_buffered_write+0x1aa/0x290 
Feb 11 07:47:54 l1 [  600.432236]  [<ffffffff8145f02d>] ? __down_write_nested+0x7d/0xb0 
Feb 11 07:47:54 l1 [  600.432242]  [<ffffffff8120387d>] ? xfs_write+0x23d/0x950 
Feb 11 07:47:54 l1 [  600.432250]  [<ffffffff810ba913>] ? do_sync_write+0xe3/0x130 
Feb 11 07:47:54 l1 [  600.432260]  [<ffffffff8106dd90>] ? autoremove_wake_function+0x0/0x30 
Feb 11 07:47:54 l1 [  600.432266]  [<ffffffff810e9654>] ? fsnotify+0x4/0x1a0 
Feb 11 07:47:54 l1 [  600.432271]  [<ffffffff8102f78e>] ? common_interrupt+0xe/0x13 
Feb 11 07:47:54 l1 [  600.432276]  [<ffffffff810bb45b>] ? vfs_write+0xcb/0x180 
Feb 11 07:47:54 l1 [  600.432280]  [<ffffffff810bb613>] ? sys_write+0x53/0xa0 
Feb 11 07:47:54 l1 [  600.432285]  [<ffffffff8102eeeb>] ? system_call_fastpath+0x16/0x1b 
Feb 11 07:47:54 l1 [  600.432291] INFO: task flush-9:3:4874 blocked for more than 120 seconds. 
Feb 11 07:47:54 l1 [  600.432294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Feb 11 07:47:54 l1 [  600.432297] flush-9:3     D
Feb 11 07:47:54 l1  ffff88021948d750 
Feb 11 07:47:54 l1     0  4874      2 0x00000000 
Feb 11 07:47:54 l1 [  600.432309]  ffff88023fa3c8c0
Feb 11 07:47:54 l1  0000000000000046
Feb 11 07:47:54 l1  000000001948d750
Feb 11 07:47:54 l1  ffffffff8163d788
Feb 11 07:47:54 l1 
Feb 11 07:47:54 l1 [  600.432323]  000000000000c928
Feb 11 07:47:54 l1  ffff8801f79fbfd8
Feb 11 07:47:54 l1  ffff88021948d750
Feb 11 07:47:54 l1  ffff88021948d9c8
Feb 11 07:47:54 l1 
Feb 11 07:47:54 l1 [  600.432329]  000000013f467bb0
Feb 11 07:47:54 l1  ffffffff8108c183
Feb 11 07:47:54 l1  0000000000000000
Feb 11 07:47:54 l1  ffff88021948d9c8
Feb 11 07:47:54 l1 
Feb 11 07:47:54 l1 [  600.432337] Call Trace: 
Feb 11 07:47:54 l1 [  600.432341]  [<ffffffff8108c183>] ? find_lock_page+0x23/0x80 
Feb 11 07:47:54 l1 [  600.432344]  [<ffffffff8108cbb1>] ? find_or_create_page+0x41/0xc0 
Feb 11 07:47:54 l1 [  600.432349]  [<ffffffff8145dbc5>] ? schedule_timeout+0x195/0x1f0 
Feb 11 07:47:54 l1 [  600.432352]  [<ffffffff8145ea81>] ? __down+0x61/0xa0 
Feb 11 07:47:54 l1 [  600.432356]  [<ffffffff81072046>] ? down+0x46/0x50 
Feb 11 07:47:54 l1 [  600.432360]  [<ffffffff811ff4a4>] ? _xfs_buf_find+0x134/0x220 
Feb 11 07:47:54 l1 [  600.432363]  [<ffffffff811ff5fe>] ? xfs_buf_get_flags+0x6e/0x190 
Feb 11 07:47:54 l1 [  600.432366]  [<ffffffff811ff732>] ? xfs_buf_read_flags+0x12/0xa0 
Feb 11 07:47:54 l1 [  600.432371]  [<ffffffff811f5226>] ? xfs_trans_read_buf+0x1f6/0x350 
Feb 11 07:47:54 l1 [  600.432377]  [<ffffffff811b0268>] ? xfs_read_agf+0x68/0x190 
Feb 11 07:47:54 l1 [  600.432380]  [<ffffffff811b03c0>] ? xfs_alloc_read_agf+0x30/0xd0 
Feb 11 07:47:54 l1 [  600.432383]  [<ffffffff811b2519>] ? xfs_alloc_fix_freelist+0x379/0x450 
Feb 11 07:47:54 l1 [  600.432388]  [<ffffffff811ddfe5>] ? xfs_iext_remove+0x35/0x80 
Feb 11 07:47:54 l1 [  600.432393]  [<ffffffff811bd83f>] ? xfs_bmap_add_extent_delay_real+0x5ef/0x11a0 
Feb 11 07:47:54 l1 [  600.432396]  [<ffffffff811f4e73>] ? xfs_trans_log_buf+0x63/0xa0 
Feb 11 07:47:54 l1 [  600.432401]  [<ffffffff811e7410>] ? xlog_state_get_iclog_space+0x60/0x2c0 
Feb 11 07:47:54 l1 [  600.432404]  [<ffffffff8145f087>] ? __down_read+0x17/0xae 
Feb 11 07:47:54 l1 [  600.432408]  [<ffffffff811b29c0>] ? xfs_alloc_vextent+0x310/0x4b0 
Feb 11 07:47:54 l1 [  600.432412]  [<ffffffff811bf148>] ? xfs_bmap_btalloc+0x598/0xa40 
Feb 11 07:47:54 l1 [  600.432417]  [<ffffffff811c19a2>] ? xfs_bmapi+0x9e2/0x11a0 
Feb 11 07:47:54 l1 [  600.432422]  [<ffffffff811f3d4f>] ? xfs_trans_reserve+0x9f/0x210 
Feb 11 07:47:54 l1 [  600.432425]  [<ffffffff811e2efe>] ? xfs_iomap_write_allocate+0x23e/0x3b0 
Feb 11 07:47:54 l1 [  600.432429]  [<ffffffff811e3aa0>] ? xfs_iomap+0x2c0/0x300 
Feb 11 07:47:54 l1 [  600.432433]  [<ffffffff811fb875>] ? xfs_map_blocks+0x25/0x30 
Feb 11 07:47:54 l1 [  600.432437]  [<ffffffff811fc9a4>] ? xfs_page_state_convert+0x414/0x6c0 
Feb 11 07:47:54 l1 [  600.432443]  [<ffffffff81224463>] ? radix_tree_gang_lookup_tag_slot+0xc3/0xf0 
Feb 11 07:47:54 l1 [  600.432447]  [<ffffffff811fcf57>] ? xfs_vm_writepage+0x77/0x130 
Feb 11 07:47:54 l1 [  600.432453]  [<ffffffff8109249a>] ? __writepage+0xa/0x40 
Feb 11 07:47:54 l1 [  600.432456]  [<ffffffff81092c6f>] ? write_cache_pages+0x1df/0x3c0 
Feb 11 07:47:54 l1 [  600.432459]  [<ffffffff81092490>] ? __writepage+0x0/0x40 
Feb 11 07:47:54 l1 [  600.432464]  [<ffffffff810d9092>] ? writeback_single_inode+0xd2/0x390 
Feb 11 07:47:54 l1 [  600.432468]  [<ffffffff810d9f3f>] ? writeback_inodes_wb+0x3ff/0x5e0 
Feb 11 07:47:54 l1 [  600.432473]  [<ffffffff810da23e>] ? wb_writeback+0x11e/0x1f0 
Feb 11 07:47:54 l1 [  600.432479]  [<ffffffff810630ee>] ? try_to_del_timer_sync+0x5e/0x90 
Feb 11 07:47:54 l1 [  600.432484]  [<ffffffff810da56b>] ? wb_do_writeback+0x17b/0x180 
Feb 11 07:47:54 l1 [  600.432487]  [<ffffffff810da5cd>] ? bdi_writeback_task+0x5d/0xa0 
Feb 11 07:47:54 l1 [  600.432492]  [<ffffffff8109e900>] ? bdi_start_fn+0x0/0xf0 
Feb 11 07:47:54 l1 [  600.432496]  [<ffffffff8109e97e>] ? bdi_start_fn+0x7e/0xf0 
Feb 11 07:47:54 l1 [  600.432499]  [<ffffffff8109e900>] ? bdi_start_fn+0x0/0xf0 
Feb 11 07:47:54 l1 [  600.432503]  [<ffffffff8106dab6>] ? kthread+0x96/0xa0 
Feb 11 07:47:54 l1 [  600.432508]  [<ffffffff8102fdfa>] ? child_rip+0xa/0x20 
Feb 11 07:47:54 l1 [  600.432513]  [<ffffffff8106da20>] ? kthread+0x0/0xa0 
Feb 11 07:47:54 l1 [  600.432517]  [<ffffffff8102fdf0>] ? child_rip+0x0/0x20 
Feb 11 07:49:54 l1 [  720.432171] INFO: task scp:4871 blocked for more than 120 seconds. 
Feb 11 07:49:54 l1 [  720.432187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Feb 11 07:49:54 l1 [  720.432192] scp           D
Feb 11 07:49:54 l1  ffff8801eee6f14c 
Feb 11 07:49:54 l1     0  4871   4870 0x00000000 
Feb 11 07:49:54 l1 [  720.432198]  ffff880220c87950
Feb 11 07:49:54 l1  0000000000000082
Feb 11 07:49:54 l1  0000000000200200
Feb 11 07:49:54 l1  ffff88023eb71400
Feb 11 07:49:54 l1 
Feb 11 07:49:54 l1 [  720.432206]  000000000000c928
Feb 11 07:49:54 l1  ffff8801f7a55fd8
Feb 11 07:49:54 l1  ffff880205fc6150
Feb 11 07:49:54 l1  ffff880205fc63c8
Feb 11 07:49:54 l1 
Feb 11 07:49:54 l1 [  720.432217]  0000000000001000
Feb 11 07:49:54 l1  ffffffff8108d02a
Feb 11 07:49:54 l1  ffff8801eee6f0c0
Feb 11 07:49:54 l1  ffff880205fc63c8
Feb 11 07:49:54 l1 
Feb 11 07:49:54 l1 [  720.432224] Call Trace: 
Feb 11 07:49:54 l1 [  720.432238]  [<ffffffff8108d02a>] ? generic_file_buffered_write+0x1aa/0x290 
Feb 11 07:49:54 l1 [  720.432243]  [<ffffffff8145f02d>] ? __down_write_nested+0x7d/0xb0 
Feb 11 07:49:54 l1 [  720.432249]  [<ffffffff8120387d>] ? xfs_write+0x23d/0x950 
Feb 11 07:49:54 l1 [  720.432254]  [<ffffffff810ba913>] ? do_sync_write+0xe3/0x130 
Feb 11 07:49:54 l1 [  720.432259]  [<ffffffff8106dd90>] ? autoremove_wake_function+0x0/0x30 
Feb 11 07:49:54 l1 [  720.432265]  [<ffffffff810e9654>] ? fsnotify+0x4/0x1a0 
Feb 11 07:49:54 l1 [  720.432270]  [<ffffffff8102f78e>] ? common_interrupt+0xe/0x13 
Feb 11 07:49:54 l1 [  720.432275]  [<ffffffff810bb45b>] ? vfs_write+0xcb/0x180 
Feb 11 07:49:54 l1 [  720.432280]  [<ffffffff810bb613>] ? sys_write+0x53/0xa0 
Feb 11 07:49:54 l1 [  720.432285]  [<ffffffff8102eeeb>] ? system_call_fastpath+0x16/0x1b 
Feb 11 07:49:54 l1 [  720.432291] INFO: task flush-9:3:4874 blocked for more than 120 seconds. 
Feb 11 07:49:54 l1 [  720.432295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Feb 11 07:49:54 l1 [  720.432298] flush-9:3     D
Feb 11 07:49:54 l1  ffff88021948d750 
Feb 11 07:49:54 l1     0  4874      2 0x00000000 
Feb 11 07:49:54 l1 [  720.432306]  ffff88023fa3c8c0
Feb 11 07:49:54 l1  0000000000000046
Feb 11 07:49:54 l1  000000001948d750
Feb 11 07:49:54 l1  ffffffff8163d788
Feb 11 07:49:54 l1 
Feb 11 07:49:54 l1 [  720.432322]  000000000000c928
Feb 11 07:49:54 l1  ffff8801f79fbfd8
Feb 11 07:49:54 l1  ffff88021948d750
Feb 11 07:49:54 l1  ffff88021948d9c8
Feb 11 07:49:54 l1 
Feb 11 07:49:54 l1 [  720.432329]  000000013f467bb0
Feb 11 07:49:54 l1  ffffffff8108c183
Feb 11 07:49:54 l1  0000000000000000
Feb 11 07:49:54 l1  ffff88021948d9c8
Feb 11 07:49:54 l1 
Feb 11 07:49:54 l1 [  720.432336] Call Trace: 
Feb 11 07:49:54 l1 [  720.432340]  [<ffffffff8108c183>] ? find_lock_page+0x23/0x80 
Feb 11 07:49:54 l1 [  720.432343]  [<ffffffff8108cbb1>] ? find_or_create_page+0x41/0xc0 
Feb 11 07:49:54 l1 [  720.432348]  [<ffffffff8145dbc5>] ? schedule_timeout+0x195/0x1f0 
Feb 11 07:49:54 l1 [  720.432351]  [<ffffffff8145ea81>] ? __down+0x61/0xa0 
Feb 11 07:49:54 l1 [  720.432356]  [<ffffffff81072046>] ? down+0x46/0x50 
Feb 11 07:49:54 l1 [  720.432361]  [<ffffffff811ff4a4>] ? _xfs_buf_find+0x134/0x220 
Feb 11 07:49:54 l1 [  720.432365]  [<ffffffff811ff5fe>] ? xfs_buf_get_flags+0x6e/0x190 
Feb 11 07:49:54 l1 [  720.432368]  [<ffffffff811ff732>] ? xfs_buf_read_flags+0x12/0xa0 
Feb 11 07:49:54 l1 [  720.432373]  [<ffffffff811f5226>] ? xfs_trans_read_buf+0x1f6/0x350 
Feb 11 07:49:54 l1 [  720.432378]  [<ffffffff811b0268>] ? xfs_read_agf+0x68/0x190 
Feb 11 07:49:54 l1 [  720.432382]  [<ffffffff811b03c0>] ? xfs_alloc_read_agf+0x30/0xd0 
Feb 11 07:49:54 l1 [  720.432386]  [<ffffffff811b2519>] ? xfs_alloc_fix_freelist+0x379/0x450 
Feb 11 07:49:54 l1 [  720.432391]  [<ffffffff811ddfe5>] ? xfs_iext_remove+0x35/0x80 
Feb 11 07:49:54 l1 [  720.432396]  [<ffffffff811bd83f>] ? xfs_bmap_add_extent_delay_real+0x5ef/0x11a0 
Feb 11 07:49:54 l1 [  720.432399]  [<ffffffff811f4e73>] ? xfs_trans_log_buf+0x63/0xa0 
Feb 11 07:49:54 l1 [  720.432404]  [<ffffffff811e7410>] ? xlog_state_get_iclog_space+0x60/0x2c0 
Feb 11 07:49:54 l1 [  720.432410]  [<ffffffff8145f087>] ? __down_read+0x17/0xae 
Feb 11 07:49:54 l1 [  720.432415]  [<ffffffff811b29c0>] ? xfs_alloc_vextent+0x310/0x4b0 
Feb 11 07:49:54 l1 [  720.432420]  [<ffffffff811bf148>] ? xfs_bmap_btalloc+0x598/0xa40 
Feb 11 07:49:54 l1 [  720.432426]  [<ffffffff811c19a2>] ? xfs_bmapi+0x9e2/0x11a0 
Feb 11 07:49:54 l1 [  720.432430]  [<ffffffff811f3d4f>] ? xfs_trans_reserve+0x9f/0x210 
Feb 11 07:49:54 l1 [  720.432434]  [<ffffffff811e2efe>] ? xfs_iomap_write_allocate+0x23e/0x3b0 
Feb 11 07:49:54 l1 [  720.432437]  [<ffffffff811e3aa0>] ? xfs_iomap+0x2c0/0x300 
Feb 11 07:49:54 l1 [  720.432441]  [<ffffffff811fb875>] ? xfs_map_blocks+0x25/0x30 
Feb 11 07:49:54 l1 [  720.432445]  [<ffffffff811fc9a4>] ? xfs_page_state_convert+0x414/0x6c0 
Feb 11 07:49:54 l1 [  720.432451]  [<ffffffff81224463>] ? radix_tree_gang_lookup_tag_slot+0xc3/0xf0 
Feb 11 07:49:54 l1 [  720.432455]  [<ffffffff811fcf57>] ? xfs_vm_writepage+0x77/0x130 
Feb 11 07:49:54 l1 [  720.432459]  [<ffffffff8109249a>] ? __writepage+0xa/0x40 
Feb 11 07:49:54 l1 [  720.432462]  [<ffffffff81092c6f>] ? write_cache_pages+0x1df/0x3c0 
Feb 11 07:49:54 l1 [  720.432466]  [<ffffffff81092490>] ? __writepage+0x0/0x40 
Feb 11 07:49:54 l1 [  720.432472]  [<ffffffff810d9092>] ? writeback_single_inode+0xd2/0x390 
Feb 11 07:49:54 l1 [  720.432476]  [<ffffffff810d9f3f>] ? writeback_inodes_wb+0x3ff/0x5e0 
Feb 11 07:49:54 l1 [  720.432480]  [<ffffffff810da23e>] ? wb_writeback+0x11e/0x1f0 
Feb 11 07:49:54 l1 [  720.432485]  [<ffffffff810630ee>] ? try_to_del_timer_sync+0x5e/0x90 
Feb 11 07:49:54 l1 [  720.432489]  [<ffffffff810da56b>] ? wb_do_writeback+0x17b/0x180 
Feb 11 07:49:54 l1 [  720.432493]  [<ffffffff810da5cd>] ? bdi_writeback_task+0x5d/0xa0 
Feb 11 07:49:54 l1 [  720.432497]  [<ffffffff8109e900>] ? bdi_start_fn+0x0/0xf0 
Feb 11 07:49:54 l1 [  720.432500]  [<ffffffff8109e97e>] ? bdi_start_fn+0x7e/0xf0 
Feb 11 07:49:54 l1 [  720.432503]  [<ffffffff8109e900>] ? bdi_start_fn+0x0/0xf0 
Feb 11 07:49:54 l1 [  720.432506]  [<ffffffff8106dab6>] ? kthread+0x96/0xa0 
Feb 11 07:49:54 l1 [  720.432511]  [<ffffffff8102fdfa>] ? child_rip+0xa/0x20 
Feb 11 07:49:54 l1 [  720.432515]  [<ffffffff8106da20>] ? kthread+0x0/0xa0 
Feb 11 07:49:54 l1 [  720.432519]  [<ffffffff8102fdf0>] ? child_rip+0x0/0x20

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ