lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1222510586.3949.45.camel@localhost.localdomain>
Date:	Sat, 27 Sep 2008 11:16:26 +0100
From:	Ian Campbell <ijc@...lion.org.uk>
To:	"J. Bruce Fields" <bfields@...ldses.org>
Cc:	Tom Tucker <tom@...ngridcomputing.com>,
	Trond Myklebust <trond.myklebust@....uio.no>,
	Grant Coady <gcoady.lk@...il.com>,
	linux-kernel@...r.kernel.org, neilb@...e.de,
	linux-nfs@...r.kernel.org
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Fri, 2008-09-26 at 23:54 -0400, J. Bruce Fields wrote:
> OK, so apologies, but this has been a long thread, and maybe we could
> use a summary of the symptoms and the results so far.  I think you said
> 2.6.24 or .25 was the last you're *positive* was good?

2.6.24.x was good 2.6.25.y was not. The Debian packages don't include
the stable release numbers in their version so I'm unsure of x and y,
Judging from the changelog entries I think 2.6.24.3 and 2.6.25.10. 

Since I have been building my own kernels I have seen repros with:

a551b98d5f6fce5897d497abd8bfb262efb33d2a Merge branch 'for-linus' of
git://git.kernel.dk/linux-2.6-block (was tip of git at the time I tested
it a while ago)

c02a79dedc7f3c3d4fdbb5eb2000cacea5df4cde v2.6.26.5

daedfbe2a67628a40076a6c75fb945c60f608a2e NFS: Ensure we zap only the
access and acl caches when setting new acls (backport of
f41f741838480aeaa3a189cff6e210503cf9c42d to 2.6.26.y stream, identified
by bisect, possibly/probably incorrectly)

f41f741838480aeaa3a189cff6e210503cf9c42d NFS: Ensure we zap only the
access and acl caches when setting new acls (original trunk commit of
above)

2e96d2867245668dbdb973729288cf69b9fafa66 NFS: Fix a warning in
nfs4_async_handle_error (commit before
f41f741838480aeaa3a189cff6e210503cf9c42d)

I did not see the problem with bce7f793daec3e65ec5c5705d2457b81fe7b5725
(v2.6.26) in 4 days 19 hours. Due to the apparent mis-bisection leading
to daedfbe2a67628a40076a6c75fb945c60f608a2e this result is in doubt, I'm
currently retesting this kernel.

The bisect was restricted to paths under fs/nfs* and net/sunrpc. I've
gone over the range v2.6.26..daedfbe2a67628a40076a6c75fb945c60f608a2e
and nothing jumps out at me. If my retest of 2.6.26 shows it to be OK
again I'll bisect over the non-NFS changesets in that range.

The symptoms are a hang of any access to an NFS mounted path, although
the rest of the system continues to work. 

Other potentially interesting info cribbed from earlier postings:

While the hang is occurring rpc_debug on the server gives nothing, on
the client gives:
[144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops-- 
[144741.637997]  3439 0004 0080    -11 f3f48200 100003 f7770000    0 xprt_sending fa0ae88e fa0bddf4
[144741.637997]  3438 0001 00a0      0 f77f2a00 100003 f77700d0 5000 xprt_pending fa0ae88e fa0bddf4

The mount points are
        hopkins:/storage/music /storage/music nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=255,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
        hopkins:/storage/mythtv/recordings /var/lib/mythtv/recordings nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
        hopkins:/var/lib/mythvideo /var/lib/mythvideo nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
        hopkins:/storage/home/ijc /home/ijc nfs rw,vers=3,rsize=131072,wsize=131072,namlen=255,hard,nointr,proto=tcp,timeo=600,retrans=2,sec=sys,mountproto=udp,addr=192.168.1.6 0 0
all of them seem to be effected simultaneously (but I'll check to make double sure next time it occurs).

I've seen hung task backtraces from effected processes, e.g. 
        Aug  4 06:27:28 iranon kernel: [137969.382277] INFO: task mythbackend:3161 blocked for more than 120 seconds.
        Aug  4 06:27:28 iranon kernel: [137969.382287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        Aug  4 06:27:28 iranon kernel: [137969.382291] mythbackend   D 00005dfc     0  3161   1
        Aug  4 06:27:28 iranon kernel: [137969.382295]        f2006c70 00000082 4b05cc7e 00005dfc f2006df0 c1c0e920 00000000 f7273390 
        Aug  4 06:27:28 iranon kernel: [137969.382301]        0000c8b4 00000000 00000001 f7273398 00000282 00000202 f71b0ab0 f200ddf0 
        Aug  4 06:27:28 iranon kernel: [137969.382306]        c1c012bc fa18f0a6 fa18f0bf c02b45a7 f71b0ab0 00000000 f200de0c fa18f0a6 
        Aug  4 06:27:28 iranon kernel: [137969.382311] Call Trace:
        Aug  4 06:27:28 iranon kernel: [137969.382347]  [<fa18f0a6>] nfs_wait_schedule+0x0/0x1e [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382384]  [<fa18f0bf>] nfs_wait_schedule+0x19/0x1e [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382399]  [<c02b45a7>] __wait_on_bit_lock+0x2a/0x52
        Aug  4 06:27:28 iranon kernel: [137969.382407]  [<fa18f0a6>] nfs_wait_schedule+0x0/0x1e [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382421]  [<c02b462e>] out_of_line_wait_on_bit_lock+0x5f/0x67
        Aug  4 06:27:28 iranon kernel: [137969.382429]  [<c0133a1d>] wake_bit_function+0x0/0x3c
        Aug  4 06:27:28 iranon kernel: [137969.382441]  [<fa18f16e>] __nfs_revalidate_inode+0xaa/0x211 [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382458]  [<c017bac0>] do_lookup+0x53/0x145
        Aug  4 06:27:28 iranon kernel: [137969.382466]  [<c0187609>] mntput_no_expire+0x11/0x64
        Aug  4 06:27:28 iranon kernel: [137969.382472]  [<c017d971>] __link_path_walk+0xa71/0xb65
        Aug  4 06:27:28 iranon kernel: [137969.382477]  [<c017bac0>] do_lookup+0x53/0x145
        Aug  4 06:27:28 iranon kernel: [137969.382483]  [<c0187609>] mntput_no_expire+0x11/0x64
        Aug  4 06:27:28 iranon kernel: [137969.382492]  [<c0187609>] mntput_no_expire+0x11/0x64
        Aug  4 06:27:28 iranon kernel: [137969.382496]  [<c017daf5>] path_walk+0x90/0x98
        Aug  4 06:27:28 iranon kernel: [137969.382502]  [<fa18f935>] nfs_getattr+0x8f/0xbe [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382520]  [<fa18f8a6>] nfs_getattr+0x0/0xbe [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382536]  [<c01785ac>] vfs_getattr+0x36/0x4d
        Aug  4 06:27:28 iranon kernel: [137969.382545]  [<c0178684>] vfs_lstat_fd+0x27/0x39
        Aug  4 06:27:28 iranon kernel: [137969.382550]  [<fa18bca6>] nfs_permission+0x0/0x129 [nfs]
        Aug  4 06:27:28 iranon kernel: [137969.382567]  [<c0187609>] mntput_no_expire+0x11/0x64
        Aug  4 06:27:28 iranon kernel: [137969.382572]  [<c0174914>] sys_faccessat+0x11e/0x15d
        Aug  4 06:27:28 iranon kernel: [137969.382582]  [<c0178703>] sys_lstat64+0xf/0x23
        Aug  4 06:27:28 iranon kernel: [137969.382588]  [<c017603c>] vfs_read+0xe3/0x11e
        Aug  4 06:27:28 iranon kernel: [137969.382598]  [<c0174962>] sys_access+0xf/0x13
        Aug  4 06:27:28 iranon kernel: [137969.382603]  [<c01077e8>] sysenter_past_esp+0x6d/0xa5
        Aug  4 06:27:28 iranon kernel: [137969.382617]  =======================

Ian.
-- 
Ian Campbell

Contemptuous lights flashed across the computer's console.
		-- Hitchhiker's Guide to the Galaxy

Download attachment "signature.asc" of type "application/pgp-signature" (198 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ