linux-kernel - Re: [PATCH] NFS: report more appropriate block size for directories.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150508151459.GA63078@tonberry.usersys.redhat.com>
Date:	Fri, 8 May 2015 11:14:59 -0400
From:	Scott Mayhew <smayhew@...hat.com>
To:	NeilBrown <neilb@...e.de>
Cc:	Trond Myklebust <trond.myklebust@...marydata.com>,
	Anna Schumaker <anna.schumaker@...app.com>,
	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] NFS: report more appropriate block size for directories.

On Fri, 08 May 2015, NeilBrown wrote:

> 
> In glibc 2.21 (and several previous), a call to opendir() will
> result in a 32K (BUFSIZ*4) buffer being allocated and passed to
> getdents.
> 
> However a call to fdopendir() results in an 'fstat' request to
> determine block size and a matching buffer allocated for subsequent
> use with getdents.  This will typically be 1M.
> 
> The first getdents call on an NFS directory will always use
> READDIR_PLUS (or NFSv4 equivalent) if available.  Subsequent getdents
> calls only use this more expensive version if some 'stat' requests are
> made between the getdents calls.
> 
> For this reason it is good to keep at least that first getdents call
> relatively short.  When fdopendir() and readdir() is used on a large
> directory, it takes approximately 32 times as long to complete as
> using "opendir".  Current versions of 'find' use fdopendir() and
> demonstrate this slowness.
> 
> 'stat' on a directory currently returns the 'wsize'.  This number has
> no meaning on directories.
> Actual READDIR requests are limited to ->dtsize, which itself is
> capped at 4 pages, coincidently the same as BUFSIZ*4.
> So this is a meaningful number to use as the blocksize on directories,
> and has the effect of making 'find' on large directories go a lot
> faster.

Would it make sense to do something similar for regular files too?
fopen() does a similar buffer allocation unless the application
overrides the buffer size via setbuffer()/setvbuf().  That can then
result in fseek() reading a lot of unnecessary data over the wire.

Prior to commit ba52de1 (inode-diet: Eliminate i_blksize from the inode
structure), a stat() over nfs would return the page size in st_blksize,
and for some workloads it does make a difference.  For instance, I have
a customer running gdb in an diskless environment.  On a stock kernel
where a stat() over nfs returns the wsize in st_blksize, their job takes
~19 minutes... on a test kernel where a stat() over nfs returns the page
size instead, that same job takes ~13 minutes.  I hadn't sent a patch
yet because I'm still trying to account for a few extra minutes of
run time elsewhere...

-Scott
> 
> Signed-off-by: NeilBrown <neilb@...e.de>
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 96f2d55781fb..f8aebf59383f 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -678,6 +678,8 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
>  	if (!err) {
>  		generic_fillattr(inode, stat);
>  		stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
> +		if (S_ISDIR(inode->i_mode))
> +			stat->blksize = NFS_SERVER(inode)->dtsize;
>  	}
>  out:
>  	trace_nfs_getattr_exit(inode, err);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/