linux-kernel - Re: high latency NFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 04 Aug 2008 18:04:34 +1000
From:	Greg Banks <gnb@...bourne.sgi.com>
To:	"J. Bruce Fields" <bfields@...ldses.org>
CC:	Michael Shuey <shuey@...due.edu>, linux-kernel@...r.kernel.org,
	linux-nfs@...r.kernel.org, rees@...i.umich.edu, aglo@...i.umich.edu
Subject: Re: high latency NFS

J. Bruce Fields wrote:
> You might get more responses from the linux-nfs list (cc'd).
>
> --b.
>
> On Thu, Jul 24, 2008 at 01:11:31PM -0400, Michael Shuey wrote:
>   
>>
>> iozone is reading/writing a file twice the size of memory on the client with 
>> a 32k block size.  I've tried raising this as high as 16 MB, but I still 
>> see around 6 MB/sec reads.
>>     
That won't make a skerrick of difference with wsize=32K.
>> I'm using a 2.6.9 derivative (yes, I'm a RHEL4 fan).  Testing with a stock 
>> 2.6, client and server, is the next order of business.
>>
>> NFS mount is tcp, version 3.  rsize/wsize are 32k.
Try wsize=rsize=1M.
>>   Both client and server 
>> have had tcp_rmem, tcp_wmem, wmem_max, rmem_max, wmem_default, and 
>> rmem_default tuned - tuning values are 12500000 for defaults (and minimum 
>> window sizes), 25000000 for the maximums.  Inefficient, yes, but I'm not 
>> concerned with memory efficiency at the moment.
>>     
You're aware that the server screws these up again, at least for
writes?  There was a long sequence of threads on linux-nfs about this
recently, starting with

http://marc.info/?l=linux-nfs&m=121312415114958&w=2

which is Dean Hildebrand posting a patch to make the knfsd behaviour
tunable.  ToT still looks broken.  I've been using the attached patch (I
believe a similar one was posted later in the thread by Olga
Kornievskaia)  for low-latency high-bandwidth 10ge performance work,
where it doesn't help but doesn't hurt either.  It should help for your
high-latency high-bandwidth case.  Keep your tunings though, one of 
them will be affecting the TCP window scale negotiated at connect time.
>> Both client and server kernels have been modified to provide 
>> larger-than-normal RPC slot tables.  I allow a max of 1024, but I've found 
>> that actually enabling more than 490 entries in /proc causes mount to 
>> complain it can't allocate memory and die.  That was somewhat suprising, 
>> given I had 122 GB of free memory at the time...
>>     
That number is used to size a physically contiguous kmalloc()ed array of
slots.  With a large wsize you don't need such large slot table sizes or
large numbers of nfsds to fill the pipe.

And yes, the default number of nfsds is utterly inadequate.
>> I've also applied a couple patches to allow the NFS readahead to be a 
>> tunable number of RPC slots. 
There's a patch in SLES to do that, which I'd very much like to see that
in kernel.org (Neil?).  The default NFS readahead multiplier value is
pessimal and guarantees worst-case alignment of READ rpcs during
streaming reads, so we tune it from 15 to 16.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


View attachment "knfsd-tcp-receive-buffer-scaling" of type "text/plain" (971 bytes)