lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <ABFC24E4C13D81489F7F624E14891C860310E10D51@uk-ex-mbx1.terastack.bluearc.com>
Date:	Thu, 22 Jul 2010 13:19:02 +0100
From:	Andy Chittenden <andyc@...earc.com>
To:	"Linux Kernel Mailing List (linux-kernel@...r.kernel.org)" 
	<linux-kernel@...r.kernel.org>
Subject: nfs client hang

We're encountering a bug similar to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578152 but that claims to be fixed in the version we're running:

# dpkg --status linux-image-2.6.32-5-amd64 | grep Version:
Version: 2.6.32-17

If I do this in 4 different xterm windows having cd to the same NFS mounted directory:

xterm1: rm -rf *
xterm2: while true; do     let iter+=1;     echo $iter;     dd if=/dev/zero of=$$ bs=1M count=1000; done
xterm3: while true; do     let iter+=1;     echo $iter;     dd if=/dev/zero of=$$ bs=1M count=1000; done
xterm4: while true; do     let iter+=1;     echo $iter;     dd if=/dev/zero of=$$ bs=1M count=1000; done

then it normally hangs before the 3rd iteration starts. The directory contains loads of information (eg 5 linux source trees).

When it gets into this hang state, here's the packets from the client to server:

4	42.909478	172.18.0.39	10.1.6.102	TCP	1013 > nfs [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=108490 TSER=0 WS=0
5	42.909577	10.1.6.102	172.18.0.39	TCP	nfs > 1013 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460
6	42.909610	172.18.0.39	10.1.6.102	TCP	1013 > nfs [ACK] Seq=1 Ack=1 Win=5840 Len=0
7	42.909672	172.18.0.39	10.1.6.102	TCP	1013 > nfs [FIN, ACK] Seq=1 Ack=1 Win=5840 Len=0
8	42.909767	10.1.6.102	172.18.0.39	TCP	nfs > 1013 [ACK] Seq=1 Ack=2 Win=64240 Len=0
9	43.660083	10.1.6.102	172.18.0.39	TCP	nfs > 1013 [FIN, ACK] Seq=1 Ack=2 Win=64240 Len=0
10	43.660100	172.18.0.39	10.1.6.102	TCP	1013 > nfs [ACK] Seq=2 Ack=2 Win=5840 Len=0

and then repeats after a while.

IE the client starts a connection and then closes it again without sending data.

FWIW I've found it easier to reproduce this problem if Ethernet flow control is off but it still happens with it on as well. This happens with different types of Ethernet hardware too. The rm -rf isn't necessary either but makes the problem easier to reproduce (for me anyway).

The mount options are:

# mount | grep u15
sweet.dev.bluearc.com:/u15 on /u/u15 type nfs (rw,noatime,nodiratime,hard,intr,rsize=32768,wsize=32768,proto=tcp,hard,intr,rsize=32768,wsize=32768,sloppy,addr=10.1.6.102)

I've generated a 2.6.34.1 kernel and that also has the same problem.

So, why would the linux NFS client get into this "non-transmitting data" state? NB 2.6.26 doesn't exhibit this problem.

-- 
Andy, BlueArc Engineering


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ