linux-kernel - debug advice request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <courier.48186806.00007CAE@fs.ru.acad.bg>
Date:	Wed, 30 Apr 2008 15:37:26 +0300
From:	plamen.petrov@...ru.acad.bg
To:	linux-kernel@...r.kernel.org
Subject: debug advice request

Hi! 

I was wondering what would the relevant kernel developers would
advise me on how to debug the following problem: 

while I'm compiling the kernel, the system would stop its activity,
but otherwise continue to function; after looking around to find out
what's going on, I discover several processes in the D state,
and dmesg would output things like this: 

[ 1032.940632] INFO: task xfsdatad/0:317 blocked for more than 120 seconds.
[ 1032.940638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 1032.940642] xfsdatad/0    D ffff81000102cdc0     0   317      2
[ 1032.940683]  ffff81007f9bbdd0 0000000000000046 0000000000000000 
0000000000000000
[ 1032.940691]  ffff81007fb335a0 ffffffff8089c4a0 ffff81007fb338e8 
000000010000ea09
[ 1032.940697]  0000000000000000 0000000000000000 0000000000000003 
0000000000000000
[ 1032.940703] Call Trace:
[ 1032.940711]  [<ffffffff803a1f60>] xfs_end_bio_delalloc+0x0/0x20
[ 1032.940717]  [<ffffffff806b8a29>] __down_write_nested+0x79/0xc0
[ 1032.940800]  [<ffffffff8037f125>] xfs_ilock+0xa5/0xe0
[ 1032.940811]  [<ffffffff803a1db0>] xfs_setfilesize+0x40/0xc0
[ 1032.940814]  [<ffffffff803a1f70>] xfs_end_bio_delalloc+0x10/0x20
[ 1032.940817]  [<ffffffff8024c8f0>] run_workqueue+0x140/0x220
[ 1032.940820]  [<ffffffff8024caa0>] worker_thread+0x0/0xd0
[ 1032.940822]  [<ffffffff8024cb31>] worker_thread+0x91/0xd0
[ 1032.940825]  [<ffffffff80250840>] autoremove_wake_function+0x0/0x30
[ 1032.940828]  [<ffffffff8024caa0>] worker_thread+0x0/0xd0
[ 1032.940830]  [<ffffffff8024caa0>] worker_thread+0x0/0xd0
[ 1032.940832]  [<ffffffff802503ab>] kthread+0x4b/0x80
[ 1032.940835]  [<ffffffff8020c428>] child_rip+0xa/0x12
[ 1032.940837]  [<ffffffff802504f4>] kthreadd+0x114/0x1a0
[ 1032.940839]  [<ffffffff80250360>] kthread+0x0/0x80
[ 1032.940940]  [<ffffffff8020c41e>] child_rip+0x0/0x12
[ 1032.940942]
[ 1032.940943] INFO: lockdep is turned off.
[ 1032.940953] INFO: task pdflush:27505 blocked for more than 120 seconds.
[ 1032.940955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 1032.940957] pdflush       D ffff81000102cdc0     0 27505      2
[ 1032.940960]  ffff810065bef4c0 0000000000000046 0000000000000000 
ffffffff8040ce06
[ 1032.940964]  ffff81006dad5960 ffffffff8089c4a0 ffff81006dad5ca8 
000000010000e9f2
[ 1032.940968]  0000000000036a20 000000008038c9f7 0000000000000003 
ffffffff8038a767
[ 1032.940972] Call Trace:
[ 1032.940976]  [<ffffffff8040ce06>] __spin_lock_init+0x36/0x70
[ 1032.940979]  [<ffffffff8038a767>] xlog_grant_push_ail+0x47/0x160
[ 1032.940982]  [<ffffffff806b8a29>] __down_write_nested+0x79/0xc0
[ 1032.940984]  [<ffffffff8037f125>] xfs_ilock+0xa5/0xe0
[ 1032.940987]  [<ffffffff8038661b>] xfs_iomap_write_allocate+0x11b/0x3c0
[ 1032.940990]  [<ffffffff806b8ea1>] _spin_lock_irqsave+0x41/0x60
[ 1032.940993]  [<ffffffff8038742e>] xfs_iomap+0x23e/0x2d0
[ 1032.940995]  [<ffffffff803a2067>] xfs_map_blocks+0x37/0x90
[ 1032.940997]  [<ffffffff803a3576>] xfs_page_state_convert+0x296/0x640
[ 1032.941001]  [<ffffffff80253635>] ktime_get_ts+0x25/0x60
[ 1032.941003]  [<ffffffff806b9519>] _spin_unlock+0x29/0x50
[ 1032.941006]  [<ffffffff8025367c>] ktime_get+0xc/0x50
[ 1032.941008]  [<ffffffff803a3a58>] xfs_vm_writepage+0x68/0x110
[ 1032.941012]  [<ffffffff8027800e>] shrink_page_list+0x52e/0x680
[ 1032.941015]  [<ffffffff803ec57d>] blk_recount_segments+0x3d/0x80
[ 1032.941018]  [<ffffffff8026fc7b>] mempool_alloc+0x4b/0x140
[ 1032.941020]  [<ffffffff80277771>] isolate_lru_pages+0x1a1/0x240
[ 1032.941023]  [<ffffffff802782c4>] shrink_inactive_list+0x164/0x450
[ 1032.941026]  [<ffffffff80278993>] shrink_zone+0xb3/0x130
[ 1032.941028]  [<ffffffff8027919f>] try_to_free_pages+0x24f/0x3d0
[ 1032.941031]  [<ffffffff80277810>] isolate_pages_global+0x0/0x40
[ 1032.941034]  [<ffffffff802728f5>] __alloc_pages_internal+0x1b5/0x460
[ 1032.941036]  [<ffffffff80272c35>] __get_free_pages+0x15/0x60
[ 1032.941038]  [<ffffffff803a1b5b>] kmem_alloc+0x5b/0x100
[ 1032.941041]  [<ffffffff8038410a>] xfs_iflush_cluster+0x4a/0x3b0
[ 1032.941043]  [<ffffffff806b9519>] _spin_unlock+0x29/0x50
[ 1032.941046]  [<ffffffff80383049>] xfs_iflush_int+0x2d9/0x340
[ 1032.941048]  [<ffffffff803846e0>] xfs_iflush+0x270/0x310
[ 1032.941052]  [<ffffffff8039bea1>] xfs_inode_flush+0xb1/0xe0
[ 1032.941055]  [<ffffffff803ab8d5>] xfs_fs_write_inode+0x25/0x70
[ 1032.941058]  [<ffffffff802b901f>] __writeback_single_inode+0x25f/0x350
[ 1032.941061]  [<ffffffff806b9519>] _spin_unlock+0x29/0x50
[ 1032.941064]  [<ffffffff803899aa>] xfs_log_need_covered+0x7a/0xd0
[ 1032.941066]  [<ffffffff802b9577>] sync_sb_inodes+0x207/0x310
[ 1032.941069]  [<ffffffff802b98d2>] writeback_inodes+0xa2/0xf0
[ 1032.941071]  [<ffffffff80273df6>] wb_kupdate+0xa6/0x120
[ 1032.941073]  [<ffffffff80274ee0>] pdflush+0x0/0x1f0
[ 1032.941076]  [<ffffffff80274ee0>] pdflush+0x0/0x1f0
[ 1032.941078]  [<ffffffff80275001>] pdflush+0x121/0x1f0
[ 1032.941080]  [<ffffffff80273d50>] wb_kupdate+0x0/0x120
[ 1032.941082]  [<ffffffff802503ab>] kthread+0x4b/0x80
[ 1032.941084]  [<ffffffff8020c428>] child_rip+0xa/0x12
[ 1032.941087]  [<ffffffff802504f4>] kthreadd+0x114/0x1a0
[ 1032.941089]  [<ffffffff80250360>] kthread+0x0/0x80
[ 1032.941091]  [<ffffffff8020c41e>] child_rip+0x0/0x12
[ 1032.941092]
[ 1032.941093] INFO: lockdep is turned off. 

After several recompilations (some of which didn't finish due to the
problem being described - so a hard reset for the machine was
necessary - reboot didn't work, nor did CRTL+ALT+DEL), while trying
to reproduce the exact steps which would trigger my problem,
a message in dmesg's output said : 

Apr 30 14:38:50 nomad64 kernel: [  393.369740] ld used greatest stack depth: 
3968 bytes left 

I hope it would give you a clue, as I don't know where to look next. 

Somehow this message got in the /var/log/syslog file; this happened
when the compile with "time make -j 4" command was about to finish; 

I had nothing else special running: KDE desktop with gkrellm, konsole
with 5 open tabs, in one of which I ran the kernel compilation,
and a konqueror window with 10 tabs, which I use to read Qt's docs; 

My hardware is: mobo - Gigabyte GA-P35-DS3R; CPU - Intel E4300;
2 GB DDR2 RAM; one 500GB WD SATA300 HDD; 2 LG optical drives (PATA); 

The software I use is the unofficial 64bit Slackware port -
Bluewhite64 12. 

I'm using XFS partitions within linux, and 5 NTFS, mounted with ntfs-3g. 

I'm attaching 2 files, containg similar output from dmesg, and a
copy of the .config file used for the kernel build. If you
need more info, or I should try something - just say so. 

Please, advise what to do next. 

Thanks for your time,
-- 
Plamen Petrov, network & system administrator
Filial - Silistra
RU "Angel Kantchev"
http://fs.ru.acad.bg/
 --------------------------------
this message is UTF8 encoded 

_
___
_____
 ------------------------------------------
This message was sent by the mail server
at fs.ru.acad.bg using the web interface:
    https://fs.ru.acad.bg/s/m/webmail
E-mail postmaster@...ru.acad.bg with anything,
regarding the server itself

View attachment "kernel-config-used.txt" of type "text/plain" (45573 bytes)

View attachment "dmesg-30-04-2008-1.txt" of type "text/plain" (52764 bytes)

View attachment "dmesg-30-04-2008-2.txt" of type "text/plain" (58231 bytes)