linux-kernel - Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1382124981.20461.4.camel@leira.trondhjem.org>
Date:	Fri, 18 Oct 2013 19:36:22 +0000
From:	"Myklebust, Trond" <Trond.Myklebust@...app.com>
To:	Helge Deller <deller@....de>
CC:	Linux Kernel Development <linux-kernel@...r.kernel.org>,
	NFS list <linux-nfs@...r.kernel.org>,
	linux-parisc <linux-parisc@...r.kernel.org>
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100%
 CPU?

On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote:
> On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
> > On Thu, 2013-10-17 at 22:42 퍭, Helge Deller wrote:
> >> I'm seeing a regression with current kernel git head when using NFS-mounts.
> >> Architecture in my case is parisc, although I don't think that this is relevant.
> >> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
> >>
> >> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
> >> Here is an output with kswapd1:
> >>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME COMMAND
> >>    37 root      20   0     0    0    0 R  91.8  0.0  63:00.40 kswapd1
> >> 28448 root      20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
> >>     1 root      20   0  2784  988  852 S   0.0  0.0   0:09.95 init
> >>
> >> This is what ps shows:
> >> lsXXXX:~# ps -ef |  grep mount
> >> root      1181     1  0 14:51 ?        00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
> >> root     25331  1181  0 21:25 ?        00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
> >> root     25332 25331  0 21:25 ?        00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
> >>
> >> And using sysrq to show the blocked tasks I get in syslog:
> >> SysRq : Show Blocked State
> >> mount.nfs       D 00000000401040c0     0 25332  25331 0x00000010
> >> Backtrace:
> >> [<0000000040113a68>] __schedule팞瓓ﴱ
> >>
> >> I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
> >> The NFS directory was already mounted and in use when this mount happened again (called by cron-job). 
> >>  
> >> Any ideas?
> > 
> > If the NFS directory is already mounted, then why is the automounter
> > trying to mount it a second time?
> 
> I was wrong in this.
> The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new
> mount.nfs was called).
> 
> I'm now not even sure, that the high kswapd is really triggered by the NFS problem,
> because I now have another machine with the blocked NFS-mount, but without
> the high kswapd usage.
> 
> Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly
> some kind of regression since it doesn't happen with older kernels.

Have you ever reproduced it without the automounter?

Also, could you please try a sysRQ-t the next time it happens, so that
we can get a trace of where the mount program is hanging. Knowing that
the mount is stuck in "__schedule()" is not really interesting unless we
know from where that was called.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@...app.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/