[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110321232857.GC472@fieldses.org>
Date: Mon, 21 Mar 2011 19:28:57 -0400
From: "J. Bruce Fields" <bfields@...ldses.org>
To: Wolfgang Walter <wolfgang.walter@...m.de>
Cc: Trond Myklebust <Trond.Myklebust@...app.com>,
linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: Re: problem with nfs4: rpciod seems to loop in rpc_shutdown_client
forever
On Fri, Mar 18, 2011 at 11:49:21PM +0100, Wolfgang Walter wrote:
> Hello,
>
> I have a problem with our nfs-server (stable 2.6.32.33 but also with
> .31 or .32 and probably older ones): sometimes
> one or more rpciod get stuck. I used
>
> rpcdebug -m rpc -s all
>
> I get messages as the following one about every second:
>
> Mar 18 11:15:37 au kernel: [44640.906793] RPC: killing all tasks for client ffff88041c51de00
> Mar 18 11:15:38 au kernel: [44641.906793] RPC: killing all tasks for client ffff88041c51de00
> Mar 18 11:15:39 au kernel: [44642.906795] RPC: killing all tasks for client ffff88041c51de00
> Mar 18 11:15:40 au kernel: [44643.906793] RPC: killing all tasks for client ffff88041c51de00
> Mar 18 11:15:41 au kernel: [44644.906795] RPC: killing all tasks for client ffff88041c51de00
> Mar 18 11:15:42 au kernel: [44645.906794] RPC: killing all tasks for client ffff88041c51de00
>
> and I get this messages:
>
> Mar 18 22:45:57 au kernel: [86061.779008] 174 0381 -5 ffff88041c51de00 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:rpc_exit_task q:none
>
> My theorie is this one:
>
> * this async task is runnable but does not progress (calling rpc_exit_task).
> * this is because the same rpciod which handles this task loops in
> rpc_shutdown_client waiting for this task to go away.
> * because rpc_shutdown_client is called from an async rpc, too
Off hand I don't see any place where rpc_shutdown_client() is called
from rpciod; do you?
> At the beginning is is always one or more tasks as above.
>
> Once a rpciod hangs more an more other tasks hang forever:
>
> Mar 18 22:45:57 au kernel: [86061.778809] -pid- flgs status -client- --rqstp- -timeout ---ops--
> Mar 18 22:45:57 au kernel: [86061.778819] 300 0281 -13 ffff8801ef5d0600 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
> Mar 18 22:45:57 au kernel: [86061.778823] 289 0281 0 ffff880142a49800 ffff8802a1dde000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
> Mar 18 22:45:57 au kernel: [86061.778827] 286 0281 0 ffff880349f57e00 ffff88010affe000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
> Mar 18 22:45:57 au kernel: [86061.778830] 283 0281 0 ffff88041d19ac00 ffff880418650000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
There's a lot of these GETPORT calls. Is portmap/rpcbind down?
--b.
> Mar 18 22:45:57 au kernel: [86061.778833] 280 0281 0 ffff880380561400 ffff8803d3b7a000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
> Mar 18 22:45:57 au kernel: [86061.778837] 277 0281 0 ffff8803ef73ac00 ffff88010af50000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
> Mar 18 22:45:57 au kernel: [86061.778840] 274 0281 0 ffff880302c31c00 ffff8803e92b6000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
> Mar 18 22:45:57 au kernel: [86061.778844] 271 0281 0 ffff8803c44d0600 ffff88002687e000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
> Mar 18 22:45:57 au kernel: [86061.778847] 234 0281 -110 ffff8801f414b000 ffff880005382000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_connect_status q:none
> Mar 18 22:45:57 au kernel: [86061.778851] 233 0281 -110 ffff8802c9945800 ffff880050382000 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778855] 235 0281 -110 ffff8802c9945800 ffff880050382150 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778858] 236 0281 -110 ffff8802c9945800 ffff8800503822a0 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778861] 237 0281 -110 ffff8802c9945800 ffff8800503823f0 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778864] 238 0281 -110 ffff8802c9945800 ffff880050382540 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778867] 239 0281 -110 ffff8802c9945800 ffff880050382690 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778870] 240 0281 -110 ffff8802c9945800 ffff8800503827e0 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778874] 241 0281 -110 ffff8802c9945800 ffff880050382930 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778877] 242 0281 -110 ffff8802c9945800 ffff880050382a80 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778880] 243 0281 -110 ffff8802c9945800 ffff880050382bd0 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778884] 244 0281 -110 ffff8802c9945800 ffff880050382d20 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778887] 245 0281 -110 ffff8802c9945800 ffff880050382e70 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778890] 246 0281 -110 ffff8802c9945800 ffff880050382fc0 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778893] 247 0281 -110 ffff8802c9945800 ffff880050383110 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778896] 248 0281 -110 ffff8802c9945800 ffff880050383260 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778899] 249 0281 -110 ffff8802c9945800 ffff8800503833b0 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
> Mar 18 22:45:57 au kernel: [86061.778903] 250 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778906] 251 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778909] 252 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778912] 253 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778915] 254 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778919] 255 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778922] 256 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778925] 257 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778928] 258 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778931] 259 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778935] 260 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778938] 261 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778941] 262 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778945] 263 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778948] 264 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778951] 265 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778954] 266 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778958] 267 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778961] 268 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778964] 269 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778967] 272 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778971] 275 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778974] 278 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778977] 281 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778980] 284 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778983] 287 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778987] 290 0281 -11 ffff8802c9945800 (null) 0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
> Mar 18 22:45:57 au kernel: [86061.778990] 232 0281 -110 ffff8803d3bc6000 ffff8803d5eaa000 0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_connect_status q:none
> Mar 18 22:45:57 au kernel: [86061.778993] 230 0281 -13 ffff8801ef5d0200 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
> Mar 18 22:45:57 au kernel: [86061.778997] 180 0281 -13 ffff880341e36800 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
> Mar 18 22:45:57 au kernel: [86061.779000] 179 0281 -13 ffff88027dc01000 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
> Mar 18 22:45:57 au kernel: [86061.779003] 178 0281 -13 ffff8801bddac600 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
> Mar 18 22:45:57 au kernel: [86061.779008] 174 0381 -5 ffff88041c51de00 (null) 0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:rpc_exit_task q:none
>
> I think I saw hanging rpciods once in a while but since we use kerberos we
> have it very often.
>
>
> My theorie may be completely bogus and there is another reason, I'm not
> familiar with the code.
>
> Regards,
> --
> Wolfgang Walter
> Studentenwerk München
> Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists