lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Apr 2007 09:15:31 -0400
From:	Trond Myklebust <Trond.Myklebust@...app.com>
To:	Florin Iucha <florin@...ha.net>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Adrian Bunk <bunk@...sta.de>,
	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

On Wed, 2007-04-18 at 07:38 -0500, Florin Iucha wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
> 
>    http://iucha.net/nfs/21-rc7-nfs2/meminfo
> 
> > Also the result of `echo m > /proc/sysrq-trigger'
> 
>    http://iucha.net/nfs/21-rc7-nfs2/big-copy
> 
> This has 'echo m > /proc/sysrq-trigger', 'echo t >
> /proc/sysrq-trigger' and 'echo 0 > /proc/sys/sunrpc/rpc_debug'.

Thanks.

So it looks as if you have a massive backlog of requests waiting in the
RPC layer to get sent. That would indeed trigger the BDI congestion
control stuff, and prevent you from sending more requests. The
interesting bit is this:

[  399.665314] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
[  399.665338] 40373 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0de0        0 xprt_resend ffffffff804196bf ffffffff80440b10
[  399.665345] 40391 0001 0001    -11 ffff81007f418508 100003 ffff810078eb05c8        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665351] 40392 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0128        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665358] 40393 0001 0001    -11 ffff81007f418508 100003 ffff810078eb1158        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665364] 40394 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0f08        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665371] 40395 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0000        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665377] 40396 0001 0001    -11 ffff81007f418508 100003 ffff810078eb1030        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665384] 40397 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0cb8        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665390] 40398 0001 0001    -11 ffff81007f418508 100003 ffff810078eb06f0        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665397] 40399 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0940        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665404] 40400 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0818        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665410] 40401 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0378        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665417] 40402 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0250        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.669252] 41086 0001 0001      0 ffff81007f418508 100003 ffff810078eb0a68    15000 xprt_pending ffffffff804196bf ffffffff80440b10
[  399.669258] 41087 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0b90        0 xprt_resend ffffffff804196bf ffffffff80440b10
[  399.669265] 41088 0001 0001    -11 ffff81007f418508 100003 ffff810078eb04a0        0 xprt_sending ffffffff804196bf ffffffff80440b10

There is only one request on the 'pending' queue. That would usually
indicate that the connection to the server is down. Can you check using
"netstat -t" whether or not there is a connection in the 'ESTABLISHED'
state to the server? Please also repeat the command a couple of times in
order to see if the socket/port number on the connection changes.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ