[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <454FD2BE.2090302@us.ibm.com>
Date: Mon, 06 Nov 2006 18:26:38 -0600
From: Steve French <smfltc@...ibm.com>
To: Dave Kleikamp <shaggy@...ux.vnet.ibm.com>
CC: Eric Sandeen <sandeen@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Theodore Tso <tytso@....edu>
Subject: Re: [RFC/PATCH] - revert generic_fillattr stat->blksize to PAGE_CACHE_SIZE
Dave Kleikamp wrote:
> On Mon, 2006-11-06 at 15:50 -0600, Eric Sandeen wrote:
>
>> The inode diet patches first did this to generic_fillattr():
>>
>> - stat->blksize = inode->i_blksize;
>> + stat->blksize = PAGE_CACHE_SIZE;
>>
>> but by 2.6.19-rc3 this was changed again:
>>
>> - stat->blksize = PAGE_CACHE_SIZE;
>> + stat->blksize = (1 << inode->i_blkbits);
>>
>> I can't find for sure why this was done; perhaps because it exposed a bug
>> in cifs[1]
>>
>
> Steve has fixed the bug in cifs_readdir().
>
>
>> However, if we are going to pick a default for generic_fillattr, it should probably
>> be what most filesystems were using before, and let filesystems which need something
>> else re-set it to according to their needs. As it stands today, doing readdirs and
>> the like in block-sized chunks rather than page sized will probably not be the
>> best thing for performance.
>>
>
> I agree.
>
>
>> so I would propose the following patch to make PAGE_CACHE_SIZE the default (again),
>> and let filesystems which need something -else- do that on their own.
>>
>> [1] http://lists.samba.org/archive/linux-cifs-client/2006-September/001481.html
>>
>> Thanks,
>> -Eric
>>
>> Signed-off-by: Eric Sandeen <sandeen@...hat.com>
>>
>> --- linux.orig/fs/stat.c 2006-11-05 23:27:04.962482569 -0600
>> +++ linux/fs/stat.c 2006-11-05 23:27:29.394396050 -0600
>> @@ -33,7 +33,7 @@
>> stat->ctime = inode->i_ctime;
>> stat->size = i_size_read(inode);
>> stat->blocks = inode->i_blocks;
>> - stat->blksize = (1 << inode->i_blkbits);
>> + stat->blksize = PAGE_CACHE_SIZE;
>> }
>>
>> EXPORT_SYMBOL(generic_fillattr);
>>
>
> Looks good. Since cifs is affected by this patch, I propose that cifs
> explicitly set stat->blksize:
>
> CIFS: Explicitly set stat->blksize
>
> CIFS may perform I/O over the network in larger chunks than the page size,
> so it should explicitly set stat->blksize to ensure optimal I/O bandwidth
>
> Signed-off-by: Dave Kleikamp <shaggy@...ux.vnet.ibm.com>
>
> diff -Nurp linux.orig/fs/cifs/inode.c linux/fs/cifs/inode.c
> --- linux.orig/fs/cifs/inode.c 2006-11-03 13:44:04.000000000 -0600
> +++ linux/fs/cifs/inode.c 2006-11-06 16:11:21.000000000 -0600
> @@ -1089,8 +1089,10 @@ int cifs_getattr(struct vfsmount *mnt, s
> struct kstat *stat)
> {
> int err = cifs_revalidate(dentry);
> - if (!err)
> + if (!err) {
> generic_fillattr(dentry->d_inode, stat);
> + stat->blksize = CIFS_MAX_MSGSIZE;
> + }
> return err;
> }
>
>
>
I assumed that the original intent of the "inode diet patch" was to
remove fields in the inode,
which most filesystems can get out of the superblock. If
inode->blksize and inode->blkbits were
related (2**blkbits == blksize) , it also makes sense to me that someone
(Ted?) removed one and left the other
as one would be redundant, but some filesystems like cifs have a large
recommended i/o size (16K),
but if someone wants to remove both from the inode that is fine by me,
as long as cifs
stat->blksize is set as you suggested on the way out of cifs_getattr.
Eventually cifs should set
the stat->blksize to a smaller value for one rarer case. For the most
common case
cifs should still set it to 16K (CIFS_MAX_MSGSIZE) as that is the most
common negotiated
buffer size but if the server does not negotiate large read support, and
the server also is so old
that it negotiates a buffer size smaller than 16K (e.g. Windows95
negotiates 2K IIRC)
then we could set stat-blksize to the smaller negotiated buffer size -
but since those servers are
getting rarer it is probably not that important. More interesting
will be the future cases in
which we will be able to set this value larger to more servers, as in
general for modern
network adapters, the larger network i/o size the better.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists