[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080521001930.202446eb.akpm@linux-foundation.org>
Date: Wed, 21 May 2008 00:19:30 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Hisashi Hifumi <hifumi.hisashi@....ntt.co.jp>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] VFS: Pagecache usage optimization on pagesize !=
blocksize environment
On Wed, 21 May 2008 15:52:04 +0900 Hisashi Hifumi <hifumi.hisashi@....ntt.co.jp> wrote:
> Hi.
>
> When we read some part of a file through pagecache, if there is a pagecache
> of corresponding index but this page is not uptodate, read IO is issued and
> this page will be uptodate.
> I think this is good for pagesize == blocksize environment but there is room
> for improvement on pagesize != blocksize environment. Because in this case
> a page can have multiple buffers and even if a page is not uptodate, some buffers
> can be uptodate. So I suggest that when all buffers which correspond to a part
> of a file that we want to read are uptodate, use this pagecache and copy data
> from this pagecache to user buffer even if a page is not uptodate. This can
> reduce read IO and improve system throughput.
I suppose that makes sense.
> I did a performance test using the sysbench.
That's not a terribly good benchmark, IMO. It's too complex.
To work out the best-case for a change like this I'd suggest a
microbenchmark which does something such as seeking all around a file
doing single-byte reads.
Then one should think up a benchmark which demonstrates the worst-case,
such as reading one-byte-quantities from a file at offsets 0, 0x2000,
0x4000, 0x6000, ... and then read more one-byte-quantities at offsets
0x1000, 0x3000, 0x5000, etc. That would be a pretty cruel comparison,
but as one tosses in more such artificial worklaods, one is in a better
position to work out whether the change is an aggregate benefit.
The results from a great big lumped-together benchmark such as sysbench
aren't a lot of use to us in predicting how effective this change will
be across all the workloads which the kernel implements.
> @@ -932,8 +932,16 @@ find_page:
> ra, filp, page,
> index, last_index - index);
> }
> - if (!PageUptodate(page))
> - goto page_not_up_to_date;
> + if (!PageUptodate(page)) {
> + if (inode->i_blkbits == PAGE_CACHE_SHIFT)
> + goto page_not_up_to_date;
> + if (TestSetPageLocked(page))
> + goto page_not_up_to_date;
> + if (!page_has_buffers(page) ||
> + !check_buffers_uptodate(offset, desc, page))
We shouldn't do this.
> + goto page_not_up_to_date_locked;
> + unlock_page(page);
> + }
See, the code which you have here is assuming that if PagePrivate is
set, then the thing which is at page.private is a ring of buffer_heads.
But this code (do_generic_file_read) doesn't know that! Take a look at
afs, nfs, perhaps other filesystems, grep for set_page_private().
Only the address_space implementation (ie: the filesystem) knows
whether page.private holds buffer_heads and only the
address_space_operations functions are allowed to call into library
functions which treat page.private as a buffer_head ring.
Now, your code _may_ not crash, because perhaps there is no filesystem
which puts something else into page.private which also uses
do_generic_file_read(). But it's still wrong.
I guess a suitable fix might be to implement the above using a new
address_space_operations callback:
if (PagePrivate(page) && aops->is_partially_uptodate) {
if (aops->is_partially_uptodate(page, desc, offset))
<OK, we can copy the data>
then implement a generic_file_is_partially_uptodate() in fs/buffer.c
and wire that up in the filesystems.
Note that things like network filesystems can then implement this also.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists