[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4589A32A.7030802@exegy.com>
Date: Wed, 20 Dec 2006 14:55:06 -0600
From: "Mr. Berkley Shands" <bshands@...gy.com>
To: linux-kernel@...r.kernel.org
CC: Dave Lloyd <dlloyd@...gy.com>
Subject: 2.6.19 xfs crash spawns tcp reclen errors off nfs
We have had several really strange NFS issues, reported out of
net/sunrpc/svcsock.c in
static int
svc_tcp_recvfrom(struct svc_rqst *rqstp)
svsk->sk_reclen &= 0x7fffffff;
dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen);
if (svsk->sk_reclen > serv->sv_bufsz) {
printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
(unsigned long) svsk->sk_reclen);
printk(KERN_NOTICE "sockaddr_in 0x%04x port 0x%04x max
0x%08lx\n",
rqstp->rq_addr.sin_addr.s_addr,
rqstp->rq_addr.sin_port, (long) serv->sv_bufsz);
goto err_delete;
}
The size reported grows, and grows, and...
the server runs 2.6.19 when XFS has its issues (as follows)
2TB partition, NFS exported. All machines are x86_64 AMD 275's with 16GB
and running redhat AS 4.4 (Centos 4.4)
a typical crash under 2.6.19 is
Dec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": XFS internal error
xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller
0xffffffff881ceaa8
Dec 14 09:13:48 sanandreas kernel:
Dec 14 09:13:48 sanandreas kernel: Call Trace:
Dec 14 09:13:48 sanandreas kernel: [<ffffffff881c79e9>]
:xfs:xfs_trans_cancel+0x5f/0xfa
Dec 14 09:13:48 sanandreas kernel: [<ffffffff881ceaa8>]
:xfs:xfs_create+0x62e/0x686
Dec 14 09:13:48 sanandreas kernel: [<ffffffff802f62be>] __up_read+0x10/0x8a
Dec 14 09:13:48 sanandreas kernel: [<ffffffff881d8730>]
:xfs:xfs_vn_mknod+0x1a2/0x3c5
Dec 14 09:13:48 sanandreas kernel: [<ffffffff804440a6>]
_spin_lock_irqsave+0x9/0xe
Dec 14 09:13:48 sanandreas kernel: [<ffffffff802f62be>] __up_read+0x10/0x8a
Dec 14 09:13:48 sanandreas kernel: [<ffffffff881c8413>]
:xfs:xfs_trans_unlocked_item+0x22/0x3c
Dec 14 09:13:48 sanandreas kernel: [<ffffffff881cd4c3>]
:xfs:xfs_access+0x3d/0x46
Dec 14 09:13:48 sanandreas kernel: [<ffffffff881d8dce>]
:xfs:xfs_vn_permission+0x11/0x15
Dec 14 09:13:48 sanandreas kernel: [<ffffffff8027fe9f>]
permission+0xcd/0xd6
Dec 14 09:13:48 sanandreas kernel: [<ffffffff80280615>]
__link_path_walk+0x16c/0xdb7
Dec 14 09:13:48 sanandreas kernel: [<ffffffff8028d8f1>]
mntput_no_expire+0x17/0x8f
Dec 14 09:13:48 sanandreas kernel: [<ffffffff80281322>]
link_path_walk+0xc2/0xd0
Dec 14 09:13:48 sanandreas kernel: [<ffffffff80281bde>]
vfs_create+0xdf/0x14b
Dec 14 09:13:48 sanandreas kernel: [<ffffffff8028200c>]
open_namei+0x18e/0x662
Dec 14 09:13:48 sanandreas kernel: [<ffffffff802789a1>]
do_filp_open+0x1c/0x3d
Dec 14 09:13:48 sanandreas kernel: [<ffffffff80278b92>]
get_unused_fd+0xea/0xf9
Dec 14 09:13:48 sanandreas kernel: [<ffffffff80278cba>]
do_sys_open+0x44/0xbe
Dec 14 09:13:48 sanandreas kernel: [<ffffffff8020979e>]
system_call+0x7e/0x83
Dec 14 09:13:48 sanandreas kernel:
Dec 14 09:13:48 sanandreas kernel: xfs_force_shutdown(sdb1,0x8) called
from line 1139 of file fs/xfs/xfs_trans.c. Return address =
0xffffffff881c7a07
Dec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": Corruption of
in-memory data detected. Shutting down filesystem: sdb1
Dec 14 09:13:48 sanandreas kernel: Please umount the filesystem, and
rectify the problem(s)
this then causes the heavy write clients to write garbage NFS request that
grow...
Dec 20 14:34:15 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
Dec 20 14:34:18 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:18 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
Dec 20 14:34:21 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:21 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
Dec 20 14:34:24 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:24 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
Dec 20 14:34:27 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:27 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
Dec 20 14:34:30 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:30 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
Dec 20 14:34:33 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)
Dec 20 14:34:33 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max
0x00008400
The clients run 2.6.19. the servers crash once a day when the heavy
simulations start up.
eth0 and eth1 are bonded into BOND0, tyan 2895 or SuperMicro H8DC8
motherboards.
Attached is a tarball, bzip'ed of the tcpdump traces and the
/var/log/messages file
bshands@...sh.eng.exegy.net home/bshands/Documents> ls -l reclen.tar.bz2
-rw-r--r-- 1 bshands dssi 24379 Dec 20 14:47 reclen.tar.bz2
bshands@...sh.eng.exegy.net home/bshands/Documents> md5sum reclen.tar.bz2
f449a3c83ca79cad04d8a63e6e253604 reclen.tar.bz2
bshands@...sh.eng.exegy.net /bulk/tmp> ls -l messages reclen.tcp*
-rw-r--r-- 1 bshands dssi 37005 Dec 20 14:35 messages
-rw-r--r-- 1 root dssi 57946 Dec 20 14:35 reclen.tcp0
-rw-r--r-- 1 root dssi 40678 Dec 20 14:35 reclen.tcp1
Neil Brown indicated that this NFS packet growth should have been fixed in
2.6.18-rc5, but it still happens, easily triggered by the XFS shutdown.
the packet sizes reported can grow to > 1GB.
Dropping back to 2.6.18.1 stops the XFS crashing, and hence stops the
NFS mess.
berkley
--
//E. F. Berkley Shands, MSc//
**Exegy Inc.**
3668 S. Geyer Road, Suite 300
St. Louis, MO 63127
Direct: (314) 450-5348
Cell: (314) 303-2546
Office: (314) 450-5353
Fax: (314) 450-5354
This e-mail and any documents accompanying it may contain legally
privileged and/or confidential information belonging to Exegy, Inc.
Such information may be protected from disclosure by law. The
information is intended for use by only the addressee. If you are not
the intended recipient, you are hereby notified that any disclosure or
use of the information is strictly prohibited. If you have received
this e-mail in error, please immediately contact the sender by e-mail or
phone regarding instructions for return or destruction and do not use or
disclose the content to others.
Download attachment "reclen.tar.bz2" of type "application/x-bzip2" (24379 bytes)
Powered by blists - more mailing lists