linux-kernel - Re: processes in D State too long too often

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20090206224551.7bbef0e4.akpm@linux-foundation.org>
Date:	Fri, 6 Feb 2009 22:45:51 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	"Gary L. Grobe" <gary@...be.net>
Cc:	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: Re: processes in D State too long too often

(cc linux-nfs)

On Sat, 07 Feb 2009 06:30:22 +0000 "Gary L. Grobe" <gary@...be.net> wrote:

> I'm currently running 2.6.27-r7 with ether and myrinet interconnects on slave quad and dual-quad dell 1950's w/ 16-32Gb's ram and a master node that is a Dell 6850 w/ 32Gb.
> 
> I've got processes running on diskless nodes mounted to a master node via NFS. I see that several of these processes are in a D state, yet they recover in a very short time back to running (a few seconds) and then the CPU goes from 0% to 100% usage (which is correct, these CPU's should be running @ 100% as they're running some number crunching simulations, but when in D state CPU usage goes to 0%).
> 
> So why they are in a D state and waiting on I/O? Then I look on the master node and see that several nfsd's are also in a D state, and shortly recover back to running (I see this shown as '-' if using ps, or R in 'top').
> 
> Running 'ps -eal', I see in the WCHAN column for the processes in a D state the following (which I believe is what the processes are waiting on). It can be a mix of these. I usually see 'sync_p', 'nfs_wa', 'lmGrou', and 'txLock'. My file system type is JFS.
> 
> Here's a snipped 'ps -eal' listing on the master node.
> 
> 1 D     0 26709     2  0  75  -5 -     0 lmGrou ?        00:00:07 nfsd
> 1 S     0 26710     2  0  75  -5 -     0 -      ?        00:00:07 nfsd
> 1 S     0 26711     2  0  75  -5 -     0 -      ?        00:00:04 nfsd
> 1 S     0 26712     2  0  75  -5 -     0 -      ?        00:00:08 nfsd
> 1 D     0 26713     2  0  75  -5 -     0 lmGrou ?        00:00:10 nfsd
> 1 S     0 26714     2  0  75  -5 -     0 -      ?        00:00:09 nfsd
> 1 D     0 26715     2  0  75  -5 -     0 txLock ?        00:00:08 nfsd
> 1 D     0 26716     2  0  75  -5 -     0 -      ?        00:00:09 nfsd
> 1 D     0 26717     2  0  75  -5 -     0 txLock ?        00:00:09 nfsd
> 1 S     0 26718     2  0  75  -5 -     0 -      ?        00:00:07 nfsd
> 1 D     0 26719     2  0  75  -5 -     0 -      ?        00:00:08 nfsd
> 1 D     0 26720     2  0  75  -5 -     0 sync_p ?        00:00:09 nfsd
> 1 S     0 26721     2  0  75  -5 -     0 -      ?        00:00:09 nfsd
> 1 S     0 26722     2  0  75  -5 -     0 -      ?        00:00:09 nfsd
> 
> And here's the same command on a diskless node which shows that my processes are in a D state w/ what seems to be nfs_wait (and from which they recover quite quickly, a few seconds later) ...
> 
> # ps -eal
> F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
> ...
> 1 S  1001  6145     1  0  80   0 -  6560 924758 ?        00:00:01 orted
> 0 D  1001  6146  6145 71  80   0 - 941316 nfs_wa ?       19:27:10 lve_1
> 0 D  1001  6147  6145 58  80   0 - 894594 nfs_wa ?       15:57:56 lve_1
> 0 R  1001  6148  6145 57  80   0 - 901343 -     ?        15:33:07 lve_1
> 0 R  1001  6149  6145 78  80   0 - 896065 -     ?        21:31:32 lve_1
> ...
> 
> 'rpcinfo -p master_node' shows that I have portmapper, mountd, nlockmgr, and nfs running w/ all the correct normal info.
> 
> It would seem as if NFS was dropping out intermittently, but I've really gone all throughout the NFS config and see nothing wrong, my DNS servers are working fine, it's all running on a local LAN (no firewall issues), and I see the same results on many different diskless nodes so I don't believe it's a hardware issues. All my previous installations have run fine w/ this same NFS config.
> 
> Others have suggested this may be a 2.6.27-r7 kernel bug. I must note that I did not have this same problem running a 2.6.17 kernel w/ XFS. The hold up seems to be in the kernel and I'm looking for any advice is this might be the case.
> 
> Because these processes are going into a D state so often, a simulation that might normally run for 6 hours now takes 2 days to complete. I've tested the myrinet and ether interconnects and I see no issues from node to node or switch. I can reproduce the problem every time between any one node and the master.
> 
> So I'm looking for thoughts as to what might be going on and how to further investigate if this is in fact a kernel issue.
> 

I guess it would help if you can run

	echo w > /proc/sysrq-trigger

and manage to hit enter when this is happening.

Then run

	dmesg -c -s 1000000 > foo

then check `foo' to see that you caught some nice traces of the stuck tasks.

Then send us those traces.  Please try to avoid wordwrapping them in
the email.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/