[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100106190534.GD6612@fieldses.org>
Date: Wed, 6 Jan 2010 14:05:34 -0500
From: "J. Bruce Fields" <bfields@...ldses.org>
To: Nikola Ciprich <extmaillist@...uxbox.cz>
Cc: linux-nfs@...r.kernel.org, nikola.ciprich@...uxbox.cz,
Linux kernel list <linux-kernel@...r.kernel.org>,
linux-ext4@...r.kernel.org
Subject: Re: 2.6.32 nfs regression?
On Mon, Dec 28, 2009 at 01:10:18PM +0100, Nikola Ciprich wrote:
> Hi,
> I just upgraded two my machines to 2.6.32.1, and am getting lots of nfs-related
> error messages and backtraces.
>
> those two repeat quite often:
> [60144.870302] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
-ECONNRESET (I wonder why?)
> [60144.870436] nfsd: peername failed (err 107)!
This looks harmless.
> and the backtraces:
> [697321.291127] INFO: task nfsd:3707 blocked for more than 120 seconds.
> [697321.291132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [697321.291135] nfsd D 00000000ffffffff 0 3707 2 0x00000080
> [697321.291143] ffff880077997b30 0000000000000046 0000000000000000 ffffffff8106e995
> [697321.291150] ffff880000000000 ffff8800779980c0 ffff880077998000 ffffffff8153a220
> [697321.291157] ffff8800779983b8 ffff8800751d90f0 000000010a626ceb 0000000000000002
> [697321.291164] Call Trace:
> [697321.291175] [<ffffffff8106e995>] ? prepare_to_wait+0x25/0x80
> [697321.291187] [<ffffffffa05f3336>] start_this_handle+0x246/0x530 [jbd2]
> [697321.291194] [<ffffffff8106e760>] ? autoremove_wake_function+0x0/0x40
> [697321.291202] [<ffffffffa05f36e3>] jbd2_journal_start+0xc3/0xf0 [jbd2]
> [697321.291223] [<ffffffffa062fe31>] ext4_journal_start_sb+0x31/0x80 [ext4]
> [697321.291235] [<ffffffffa0620b95>] ext4_dirty_inode+0x25/0x60 [ext4]
> [697321.291242] [<ffffffff811364b5>] __mark_inode_dirty+0x35/0x180
> [697321.291247] [<ffffffff8112c67c>] inode_setattr+0x6c/0x180
> [697321.291259] [<ffffffffa062097b>] ext4_setattr+0x14b/0x340 [ext4]
This looks likely to be an ext4 problem, but I don't know.
--b.
> [697321.291264] [<ffffffff8112c907>] notify_change+0x177/0x350
> [697321.291279] [<ffffffffa04b1d8d>] nfsd_setattr+0x26d/0x510 [nfsd]
> [697321.291291] [<ffffffffa04b9cef>] nfsd3_proc_setattr+0x7f/0xd0 [nfsd]
> [697321.291301] [<ffffffffa04ac3b5>] nfsd_dispatch+0xb5/0x230 [nfsd]
> [697321.291323] [<ffffffffa03c6b07>] svc_process+0x477/0x780 [sunrpc]
> [697321.291333] [<ffffffffa04ac860>] ? nfsd+0x0/0x150 [nfsd]
> [697321.291342] [<ffffffffa04ac91d>] nfsd+0xbd/0x150 [nfsd]
> [697321.291347] [<ffffffff8106e61e>] kthread+0x8e/0xa0
> [697321.291352] [<ffffffff8100c31a>] child_rip+0xa/0x20
> [697321.291357] [<ffffffff8106e590>] ? kthread+0x0/0xa0
> [697321.291361] [<ffffffff8100c310>] ? child_rip+0x0/0x20
> [697321.291365] 2 locks held by nfsd/3707:
> [697321.291368] #0: (hash_sem){......}, at: [<ffffffffa04b5710>] exp_readlock+0x10/0x20 [nfsd]
> [697321.291383] #1: (&sb->s_type->i_mutex_key#14){......}, at: [<ffffffffa04b1ef7>] nfsd_setattr+0x3d7/0x510 [nfsd]
> [697321.291403] INFO: task jbd2/dm-7-8:4448 blocked for more than 120 seconds.
> [697321.291406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [697321.291409] jbd2/dm-7-8 D 00000000ffffffff 0 4448 2 0x00000080
> [697321.291415] ffff8800722e7ce0 0000000000000046 0000000000000000 ffffffff8106e995
> [697321.291422] ffff880000000000 ffff8800722d4f00 ffff8800722d4e40 ffff88007f8ace40
> [697321.291428] ffff8800722d51f8 ffff8800751d9230 000000010a626d1b 0000000000000002
> [697321.291435] Call Trace:
> [697321.291440] [<ffffffff8106e995>] ? prepare_to_wait+0x25/0x80
> [697321.291449] [<ffffffffa05f4c1a>] jbd2_journal_commit_transaction+0x1ea/0x1890 [jbd2]
> [697321.291455] [<ffffffff8105e026>] ? lock_timer_base+0x36/0x70
> [697321.291460] [<ffffffff8106e760>] ? autoremove_wake_function+0x0/0x40
> [697321.291466] [<ffffffff8105e0a4>] ? try_to_del_timer_sync+0x44/0x100
> [697321.291475] [<ffffffffa05fbbba>] kjournald2+0xba/0x240 [jbd2]
> [697321.291479] [<ffffffff8106e760>] ? autoremove_wake_function+0x0/0x40
> [697321.291487] [<ffffffffa05fbb00>] ? kjournald2+0x0/0x240 [jbd2]
> [697321.291492] [<ffffffff8106e61e>] kthread+0x8e/0xa0
> [697321.291497] [<ffffffff8100c31a>] child_rip+0xa/0x20
> [697321.291501] [<ffffffff8106e590>] ? kthread+0x0/0xa0
> [697321.291506] [<ffffffff8100c310>] ? child_rip+0x0/0x20
> [697321.291509] no locks held by jbd2/dm-7-8/4448.
> [698041.288035] INFO: task nfsd:3703 blocked for more than 120 seconds.
> [698041.288040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [698041.288044] nfsd D 00000000ffffffff 0 3703 2 0x00000080
> [698041.288051] ffff88007a309930 0000000000000046 0000000000000000 0000000000000000
> [698041.288058] 0000000000000000 ffff88007a1200c0 ffff88007a120000 ffffffff8153a220
> [698041.288065] ffff88007a12le_move+0x28/0x60
> [698041.288522] [<ffffffff81115db8>] ? file_move+0x28/0x60
> [698041.288527] [<ffffffff81115149>] vfs_writev+0x39/0x60
> [698041.288537] [<ffffffffa04b0d73>] nfsd_vfs_write+0x103/0x410 [nfsd]
> [698041.288543] [<ffffffff8111430d>] ? dentry_open+0x4d/0xb0
> [698041.288553] [<ffffffffa04b161c>] ? nfsd_open+0x15c/0x1e0 [nfsd]
> [698041.288563] [<ffffffffa04b1a25>] nfsd_write+0xe5/0x100 [nfsd]
> [698041.288575] [<ffffffffa04ba19e>] nfsd3_proc_write+0xfe/0x140 [nfsd]
> [698041.288584] [<ffffffffa04ac3b5>] nfsd_dispatch+0xb5/0x230 [nfsd]
> [698041.288599] [<ffffffffa03c6b07>] svc_process+0x477/0x780 [sunrpc]
> [698041.288610] [<ffffffffa04ac860>] ? nfsd+0x0/0x150 [nfsd]
> [698041.288618] [<ffffffffa04ac91d>] nfsd+0xbd/0x150 [nfsd]
> [698041.288623] [<ffffffff8106e61e>] kthread+0x8e/0xa0
> [698041.288628] [<ffffffff8100c31a>] child_rip+0xa/0x20
>
> there's ext4 mentioned, so I'm not 100% sure if it ain't ext4-related,
> but on second machine (without ext4) I'm getting similar messages:
>
> [704103.575633] nfsd: peername failed (err 107)!
> [704280.628046] INFO: task reiserfs/0:2637 blocked for more than 120 seconds.
> [704280.628050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [704280.628053] reiserfs/0 D 0000000000000000 0 2637 2 0x00000000
> [704280.628058] ffff880139267c80 0000000000000046 0000000000000000 000000008103d1e9
> [704280.628063] 0000000000000000 ffff88013f1400c0 ffff88013f140000 ffffffff8153a220
> [704280.628067] ffff88013f1403b8 0000000000000000 000000010a7cf74c 0000000000000000
> [704280.628071] Call Trace:
> [704280.628092] [<ffffffffa0216b75>] ? flush_commit_list+0x145/0x6f0 [reiserfs]
> [704280.628099] [<ffffffff81340a82>] mutex_lock_nested+0x132/0x330
> [704280.628106] [<ffffffffa0216b75>] ? flush_commit_list+0x145/0x6f0 [reiserfs]
> [704280.628115] [<ffffffffa0216b75>] flush_commit_list+0x145/0x6f0 [reiserfs]
> [704280.628123] [<ffffffffa0216c67>] flush_commit_list+0x237/0x6f0 [reiserfs]
> [704280.628131] [<ffffffffa021b200>] ? flush_async_commits+0x0/0x70 [reiserfs]
> [704280.628138] [<ffffffffa021b257>] flush_async_commits+0x57/0x70 [reiserfs]
> [704280.628144] [<ffffffff81069f85>] worker_thread+0x235/0x390
> [704280.628147] [<ffffffff81069f33>] ? worker_thread+0x1e3/0x390
> [704280.628151] [<ffffffff8106e760>] ? autoremove_wake_function+0x0/0x40
> [704280.628155] [<ffffffff81069d50>] ? worker_thread+0x0/0x390
> [704280.628158] [<ffffffff8106e61e>] kthread+0x8e/0xa0
> [704280.628162] [<ffffffff8100c31a>] child_rip+0xa/0x20
> [704280.628165] [<ffffffff8106e590>] ? kthread+0x0/0xa0
> [704280.628168] [<ffffffff8100c310>] ? child_rip+0x0/0x20
> [704280.628170] 3 locks held by reiserfs/0/2637:
> [704280.628172] #0: (reiserfs){......}, at: [<ffffffff81069f33>] worker_thread+0x1e3/0x390
> [704280.628178] #1: (&(&journal->j_work)->work){......}, at: [<ffffffff81069f33>] worker_thread+0x1e3/0x390
> [704280.628184] #2: (&jl->j_commit_mutex){......}, at: [<ffffffffa0216b75>] flush_commit_list+0x145/0x6f0 [reiserfs]
> [704329.500101] rpc-srv/tcp: nfsd: got error -32 when sending 140 bytes - shutting down socket
>
> which leads me to suspicion the problems are somewhere within nfs code...
> both machines are SMP x86_64.
> I'll update to 2.6.32.2 on Wednesday, but according to changelog, it doesn't
> seem to fix anything related...
> Could somebody please have a look at this? I'll be glad to provide further
> information if needed...
>
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.: +420 596 603 142
> fax: +420 596 621 273
> mobil: +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis@...uxbox.cz
> -------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists