linux-kernel - Re: nfsd stuckage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1231426650.11687.459.camel@twins>
Date:	Thu, 08 Jan 2009 15:57:30 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
	Neil Brown <neilb@...e.de>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: nfsd stuckage

On Tue, 2009-01-06 at 14:56 -0800, Andrew Morton wrote:
> I just built current mainline plus the just-sent 266 -mm patches.
> 
> The machine failed to power off when hit with `halt -pfn'.  dmesg output:

> [  672.162677] INFO: task nfsd4:4324 blocked for more than 480 seconds.
> [  672.162706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  672.162725]  ffff880251df1d60 0000000000000046 ffff88025e1c0580 ffff8802488013d8
> [  672.162753]  ffff880251df1d20 ffff88024c49a7a0 ffff88025e088760 ffff88024c49ab18
> [  672.162834]  000000002807ee00 00000000ffff59ec ffff880251df1d50 0000000000000282
> [  672.162865] Call Trace:
> [  672.162880]  [<ffffffff8052a1d8>] __mutex_lock_slowpath+0x6a/0xac
> [  672.162895]  [<ffffffff8052a0c5>] mutex_lock+0x2c/0x30
> [  672.162908]  [<ffffffff802c4747>] vfs_fsync+0x63/0xa9
> [  672.162933]  [<ffffffffa048c74c>] nfsd_sync_dir+0x10/0x12 [nfsd]
> [  672.162960]  [<ffffffffa04a5427>] nfsd4_sync_rec_dir+0x27/0x40 [nfsd]
> [  672.162984]  [<ffffffffa04a592a>] nfsd4_recdir_purge_old+0x3d/0x6a [nfsd]
> [  672.163023]  [<ffffffffa04a1745>] laundromat_main+0x62/0x225 [nfsd]
> [  672.163049]  [<ffffffffa04a16e3>] ? laundromat_main+0x0/0x225 [nfsd]
> [  672.163064]  [<ffffffff8024b4a7>] run_workqueue+0x8d/0x124
> [  672.163076]  [<ffffffff8024b5e0>] ? worker_thread+0x0/0xe5
> [  672.163089]  [<ffffffff8024b6b8>] worker_thread+0xd8/0xe5
> [  672.163102]  [<ffffffff8024e8cc>] ? autoremove_wake_function+0x0/0x36
> [  672.163115]  [<ffffffff8024b5e0>] ? worker_thread+0x0/0xe5
> [  672.163127]  [<ffffffff8024e5e0>] kthread+0x44/0x6b
> [  672.163140]  [<ffffffff8020cfba>] child_rip+0xa/0x20
> [  672.163151]  [<ffffffff8024e59c>] ? kthread+0x0/0x6b
> [  672.163162]  [<ffffffff8020cfb0>] ? child_rip+0x0/0x20

FWIW lockdep seems to warn about this...

All I have to do to trigger this is boot the machine and let it sit for
a few minutes.

[  113.552497] =============================================
[  113.553289] [ INFO: possible recursive locking detected ]
[  113.553289] 2.6.28-tip #592                              
[  113.553289] ---------------------------------------------
[  113.553289] nfsd4/1914 is trying to acquire lock:        
[  113.553289]  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffff802e7e5e>] vfs_fsync+0x6c/0xb1
[  113.553289]                                                                                
[  113.553289] but task is already holding lock:                                              
[  113.553289]  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffffa0190727>] nfsd4_sync_rec_dir+0x22/0x47 [nfsd]                                                                                                        
[  113.553289]                                                                                             
[  113.553289] other info that might help us debug this:                                                   
[  113.553289] 4 locks held by nfsd4/1914:                                                                 
[  113.553289]  #0:  (nfsd4){--..}, at: [<ffffffff80252303>] run_workqueue+0xb6/0x21b                      
[  113.553289]  #1:  ((laundromat_work).work){--..}, at: [<ffffffff80252303>] run_workqueue+0xb6/0x21b     
[  113.553289]  #2:  (client_mutex){--..}, at: [<ffffffffa018bd05>] laundromat_main+0x33/0x24e [nfsd]      
[  113.553289]  #3:  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffffa0190727>] nfsd4_sync_rec_dir+0x22/0x47 [nfsd]                                                                                                   
[  113.553289]                                                                                             
[  113.553289] stack backtrace:                                                                            
[  113.553289] Pid: 1914, comm: nfsd4 Not tainted 2.6.28-tip #592                                          
[  113.553289] Call Trace:                                                                                 
[  113.553289]  [<ffffffff80266987>] __lock_acquire+0xe42/0x161a                                           
[  113.553289]  [<ffffffff80288857>] ? __call_rcu+0x7a/0x107                                               
[  113.553289]  [<ffffffff802671b4>] lock_acquire+0x55/0x71                                                
[  113.553289]  [<ffffffff802e7e5e>] ? vfs_fsync+0x6c/0xb1                                                 
[  113.553289]  [<ffffffff805568d0>] mutex_lock_nested+0x4e/0x320                                          
[  113.553289]  [<ffffffff802e7e5e>] ? vfs_fsync+0x6c/0xb1                                                 
[  113.553289]  [<ffffffff8029bde0>] ? __filemap_fdatawrite_range+0x57/0x5f                                
[  113.553289]  [<ffffffff802e7e5e>] vfs_fsync+0x6c/0xb1                                                   
[  113.553289]  [<ffffffffa0176f8f>] nfsd_sync_dir+0x15/0x17 [nfsd]                                        
[  113.553289]  [<ffffffffa0190733>] nfsd4_sync_rec_dir+0x2e/0x47 [nfsd]                                   
[  113.553289]  [<ffffffffa0190791>] nfsd4_recdir_purge_old+0x45/0x73 [nfsd]                               
[  113.553289]  [<ffffffffa018bd44>] laundromat_main+0x72/0x24e [nfsd]                                     
[  113.553289]  [<ffffffff80252355>] run_workqueue+0x108/0x21b                                             
[  113.553289]  [<ffffffff80252303>] ? run_workqueue+0xb6/0x21b                                            
[  113.553289]  [<ffffffffa018bcd2>] ? laundromat_main+0x0/0x24e [nfsd]                                    
[  113.553289]  [<ffffffff8025254d>] worker_thread+0xe5/0xf6                                               
[  113.553289]  [<ffffffff80256615>] ? autoremove_wake_function+0x0/0x3d                                   
[  113.553289]  [<ffffffff80252468>] ? worker_thread+0x0/0xf6                                              
[  113.553289]  [<ffffffff80256200>] kthread+0x4e/0x7b                                                     
[  113.553289]  [<ffffffff8020d51a>] child_rip+0xa/0x20                                                    
[  113.553289]  [<ffffffff8020cec0>] ? restore_args+0x0/0x30                                               
[  113.553289]  [<ffffffff802561b2>] ? kthread+0x0/0x7b                                                    
[  113.553289]  [<ffffffff8020d510>] ? child_rip+0x0/0x20 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/