linux-kernel - Re: Massive slowdown when re-querying large nfs dir

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20071107090529.f45626de.akpm@linux-foundation.org>
Date:	Wed, 7 Nov 2007 09:05:29 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Al Boldi <a1426z@...ab.com>
Cc:	neilb@...e.de, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: Massive slowdown when re-querying large nfs dir

> On Wed, 7 Nov 2007 12:36:26 +0300 Al Boldi <a1426z@...ab.com> wrote:
> Neil Brown wrote:
> > On Tuesday November 6, akpm@...ux-foundation.org wrote:
> > > > On Tue, 6 Nov 2007 14:28:11 +0300 Al Boldi <a1426z@...ab.com> wrote:
> > > > Al Boldi wrote:
> > > > > There is a massive (3-18x) slowdown when re-querying a large nfs dir
> > > > > (2k+ entries) using a simple ls -l.
> > > > >
> > > > > On 2.6.23 client and server running userland rpc.nfs.V2:
> > > > > first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
> > > > > more tries: time -p ls -l <2k+ entry dir>  in ~8sec
> > > > >
> > > > > first  try: time -p ls -l <5k+ entry dir>  in ~9sec
> > > > > more tries: time -p ls -l <5k+ entry dir>  in ~180sec
> > > > >
> > > > > On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2:
> > > > > first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
> > > > > more tries: time -p ls -l <2k+ entry dir>  in ~7sec
> > > > >
> > > > > first  try: time -p ls -l <5k+ entry dir>  in ~8sec
> > > > > more tries: time -p ls -l <5k+ entry dir>  in ~43sec
> > > > >
> > > > > Remounting the nfs-dir on the client resets the problem.
> > > > >
> > > > > Any ideas?
> > > >
> > > > Ok, I played some more with this, and it turns out that nfsV3 is a lot
> > > > faster.  But, this does not explain why the 2.4.31 kernel is still
> > > > over 4-times faster than 2.6.23.
> > > >
> > > > Can anybody explain what's going on?
> > >
> > > Sure, Neil can! ;)
> 
> Thanks Andrew!
> 
> > Nuh.
> > He said "userland rpc.nfs.Vx".  I only do "kernel-land NFS".  In these
> > days of high specialisation, each line of code is owned by a different
> > person, and finding the right person is hard....
> >
> > I would suggest getting a 'tcpdump -s0' trace and seeing (with
> > wireshark) what is different between the various cases.
> 
> Thanks Neil for looking into this.  Your suggestion has already been answered 
> in a previous post, where the difference has been attributed to "ls -l" 
> inducing lookup for the first try, which is fast, and getattr for later 
> tries, which is super-slow.
> 
> Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's 
> not clear is how come 2.4.31 handles getattr faster than 2.6.23?
> 

We broke 2.6?  It'd be interesting to run the ls in an infinite loop on the
client them start poking at the server.  Is the 2.6 server doing physical
IO?  Is the 2.6 server consuming more system time?  etc.  A basic `vmstat
1' trace for both 2.4 and 2.6 would be a starting point.

Could be that there's some additional latency caused by networking changes,
too.  I expect the tcpdump/wireshark/etc traces would have sufficient
resolution for us to be able to see that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/