lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <201103182349.22331.wolfgang.walter@stwm.de>
Date:	Fri, 18 Mar 2011 23:49:21 +0100
From:	Wolfgang Walter <wolfgang.walter@...m.de>
To:	Trond Myklebust <Trond.Myklebust@...app.com>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: problem with nfs4: rpciod seems to loop in rpc_shutdown_client forever

Hello,

I have a problem with our nfs-server (stable 2.6.32.33 but also with
.31 or .32 and probably older ones): sometimes
one or more rpciod get stuck. I used

	rpcdebug -m rpc -s all

I get messages as the following one about every second:

Mar 18 11:15:37 au kernel: [44640.906793] RPC:       killing all tasks for client ffff88041c51de00
Mar 18 11:15:38 au kernel: [44641.906793] RPC:       killing all tasks for client ffff88041c51de00
Mar 18 11:15:39 au kernel: [44642.906795] RPC:       killing all tasks for client ffff88041c51de00
Mar 18 11:15:40 au kernel: [44643.906793] RPC:       killing all tasks for client ffff88041c51de00
Mar 18 11:15:41 au kernel: [44644.906795] RPC:       killing all tasks for client ffff88041c51de00
Mar 18 11:15:42 au kernel: [44645.906794] RPC:       killing all tasks for client ffff88041c51de00

and I get this messages:

Mar 18 22:45:57 au kernel: [86061.779008]   174 0381     -5 ffff88041c51de00   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:rpc_exit_task q:none

My theorie is this one:

* this async task is runnable but does not progress (calling rpc_exit_task).
* this is because the same rpciod which handles this task loops in
  rpc_shutdown_client waiting for this task to go away.
* because rpc_shutdown_client is called from an async rpc, too

At the beginning is is always one or more tasks as above.

Once a rpciod hangs more an more other tasks hang forever:

Mar 18 22:45:57 au kernel: [86061.778809] -pid- flgs status -client- --rqstp- -timeout ---ops--
Mar 18 22:45:57 au kernel: [86061.778819]   300 0281    -13 ffff8801ef5d0600   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
Mar 18 22:45:57 au kernel: [86061.778823]   289 0281      0 ffff880142a49800 ffff8802a1dde000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778827]   286 0281      0 ffff880349f57e00 ffff88010affe000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778830]   283 0281      0 ffff88041d19ac00 ffff880418650000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778833]   280 0281      0 ffff880380561400 ffff8803d3b7a000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778837]   277 0281      0 ffff8803ef73ac00 ffff88010af50000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778840]   274 0281      0 ffff880302c31c00 ffff8803e92b6000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778844]   271 0281      0 ffff8803c44d0600 ffff88002687e000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_status q:none
Mar 18 22:45:57 au kernel: [86061.778847]   234 0281   -110 ffff8801f414b000 ffff880005382000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_connect_status q:none
Mar 18 22:45:57 au kernel: [86061.778851]   233 0281   -110 ffff8802c9945800 ffff880050382000        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778855]   235 0281   -110 ffff8802c9945800 ffff880050382150        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778858]   236 0281   -110 ffff8802c9945800 ffff8800503822a0        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778861]   237 0281   -110 ffff8802c9945800 ffff8800503823f0        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778864]   238 0281   -110 ffff8802c9945800 ffff880050382540        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778867]   239 0281   -110 ffff8802c9945800 ffff880050382690        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778870]   240 0281   -110 ffff8802c9945800 ffff8800503827e0        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778874]   241 0281   -110 ffff8802c9945800 ffff880050382930        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778877]   242 0281   -110 ffff8802c9945800 ffff880050382a80        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778880]   243 0281   -110 ffff8802c9945800 ffff880050382bd0        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778884]   244 0281   -110 ffff8802c9945800 ffff880050382d20        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778887]   245 0281   -110 ffff8802c9945800 ffff880050382e70        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778890]   246 0281   -110 ffff8802c9945800 ffff880050382fc0        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778893]   247 0281   -110 ffff8802c9945800 ffff880050383110        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778896]   248 0281   -110 ffff8802c9945800 ffff880050383260        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778899]   249 0281   -110 ffff8802c9945800 ffff8800503833b0        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_bind_status q:none
Mar 18 22:45:57 au kernel: [86061.778903]   250 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778906]   251 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778909]   252 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778912]   253 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778915]   254 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778919]   255 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778922]   256 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778925]   257 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778928]   258 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778931]   259 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778935]   260 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778938]   261 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778941]   262 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778945]   263 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778948]   264 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778951]   265 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778954]   266 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778958]   267 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778961]   268 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778964]   269 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 UNLOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778967]   272 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778971]   275 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778974]   278 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778977]   281 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778980]   284 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778983]   287 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778987]   290 0281    -11 ffff8802c9945800   (null)        0 ffffffff81721440 lockdv4 LOCK_RES a:call_reserveresult q:xprt_backlog
Mar 18 22:45:57 au kernel: [86061.778990]   232 0281   -110 ffff8803d3bc6000 ffff8803d5eaa000        0 ffffffff817a3fd0 rpcbindv2 GETPORT a:call_connect_status q:none
Mar 18 22:45:57 au kernel: [86061.778993]   230 0281    -13 ffff8801ef5d0200   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
Mar 18 22:45:57 au kernel: [86061.778997]   180 0281    -13 ffff880341e36800   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
Mar 18 22:45:57 au kernel: [86061.779000]   179 0281    -13 ffff88027dc01000   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
Mar 18 22:45:57 au kernel: [86061.779003]   178 0281    -13 ffff8801bddac600   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:call_refreshresult q:none
Mar 18 22:45:57 au kernel: [86061.779008]   174 0381     -5 ffff88041c51de00   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL a:rpc_exit_task q:none

I think I saw hanging rpciods once in a while but since we use kerberos we
have it very often.


My theorie may be completely bogus and there is another reason, I'm not
familiar with the code.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ