lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-id: <4B76D08D.3000000@majjas.com>
Date:	Sat, 13 Feb 2010 11:17:17 -0500
From:	Michael Breuer <mbreuer@...jas.com>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Hung task - sync - 2.6.33-rc7  w/md6 multicore rebuild in process

Scenario:

1. raid6 (software - 6 1Tb sata drives) doing a resync (multi core enabled)
2. rebuilding kernel (rc8)
3. system became sluggish - top & vmstat showed all 12Gb ram used - 
albeit 10g of fs cache. It seemed as though relcaim of fs cache became 
really slow once there were no more "free" pages.
vmstat <after hung task reported - don't have from before>
procs -----------memory---------- ---swap-- -----io---- --system-- 
-----cpu-----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa st
  0  1    808 112476 347592 9556952    0    0    39   388  158  189  1 
18 77  4  0
4. Worrying a bit about the looming instability, I typed, "sync."
5. sync took a long time, and was reported by the kernel as a hung task 
(repeatedly) - see below.
6. entering additional sync commands also hang (unsuprising, but figured 
I'd try as non-root).
7. The running sync (pid 11975) cannot be killed.
8. echo 1 > drop_caches does clear the fs cache. System behaves better 
after this (but sync is still hung).

config attached.

Running with sky2 dma patches (in rc8) and increased the audit name 
space to avoid the flood of name space maxed warnings.

My current plan is to let the raid rebuild complete and then reboot (to 
rc8 if the bits made it to disk)... maybe with a backup of recently 
changed files to an external system.

Feb 13 10:54:13 mail kernel: INFO: task sync:11975 blocked for more than 
120 seconds.
Feb 13 10:54:13 mail kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 10:54:13 mail kernel: sync          D 0000000000000002     0 
11975   6433 0x00000000
Feb 13 10:54:13 mail kernel: ffff8801c45f3da8 0000000000000082 
ffff8800282f5948 ffff8800282f5920
Feb 13 10:54:13 mail kernel: ffff88032f785d78 ffff88032f785d40 
000000030c37a771 0000000000000282
Feb 13 10:54:13 mail kernel: ffff8801c45f3fd8 000000000000f888 
ffff88032ca00000 ffff8801c61c9750
Feb 13 10:54:13 mail kernel: Call Trace:
Feb 13 10:54:13 mail kernel: [<ffffffff81154730>] ? bdi_sched_wait+0x0/0x20
Feb 13 10:54:13 mail kernel: [<ffffffff8115473e>] bdi_sched_wait+0xe/0x20
Feb 13 10:54:13 mail kernel: [<ffffffff81537b4f>] __wait_on_bit+0x5f/0x90
Feb 13 10:54:13 mail kernel: [<ffffffff81154730>] ? bdi_sched_wait+0x0/0x20
Feb 13 10:54:13 mail kernel: [<ffffffff81537bf8>] 
out_of_line_wait_on_bit+0x78/0x90
Feb 13 10:54:13 mail kernel: [<ffffffff81078650>] ? 
wake_bit_function+0x0/0x50
Feb 13 10:54:13 mail kernel: [<ffffffff8104ac55>] ? 
wake_up_process+0x15/0x20
Feb 13 10:54:13 mail kernel: [<ffffffff81155daf>] 
bdi_sync_writeback+0x6f/0x80
Feb 13 10:54:13 mail kernel: [<ffffffff81155de2>] sync_inodes_sb+0x22/0x100
Feb 13 10:54:13 mail kernel: [<ffffffff81159902>] 
__sync_filesystem+0x82/0x90
Feb 13 10:54:13 mail kernel: [<ffffffff81159a04>] 
sync_filesystems+0xf4/0x120
Feb 13 10:54:13 mail kernel: [<ffffffff81159a91>] sys_sync+0x21/0x40
Feb 13 10:54:13 mail kernel: [<ffffffff8100b0f2>] 
system_call_fastpath+0x16/0x1b

<this repeats every 120 seconds - all the same traceback>



View attachment "config_2.6.33-rc7AUDIT_NC_80-00077-g01a3cc3" of type "text/plain" (84885 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ