[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070419224225.GJ32602149@melbourne.sgi.com>
Date: Fri, 20 Apr 2007 08:42:25 +1000
From: David Chinner <dgc@....com>
To: Christoph Lameter <clameter@....com>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <nickpiggin@...oo.com.au>,
Paul Jackson <pj@....com>, Dave Chinner <dgc@....com>,
Andi Kleen <ak@...e.de>
Subject: Re: [RFC 0/8] Variable Order Page Cache
On Thu, Apr 19, 2007 at 09:35:04AM -0700, Christoph Lameter wrote:
> Variable Order Page Cache Patchset
>
> This patchset modifies the core VM so that higher order page cache pages
> become possible. The higher order page cache pages are compound pages
> and can be handled in the same way as regular pages.
>
> The order of the pages is determined by the order set up in the mapping
> (struct address_space). By default the order is set to zero.
> This means that higher order pages are optional. There is no attempt here
> to generally change the page order of the page cache. 4K pages are effective
> for small files.
>
> However, it would be good if the VM would support I/O to higher order pages
> to enable efficient support for large scale I/O. If one wants to write a
> long file of a few gigabytes then the filesystem should have a choice of
> selecting a larger page size for that file and handle larger chunks of
> memory at once.
>
> The support here is only for buffered I/O and only for one filesystem (ramfs).
> Modification of other filesystems to support higher order pages may require
> extensive work of other components of the kernel. But I hope this shows that
> there is a relatively easy way to that goal that could be taken in steps..
So looking at this the main thing for converting a filesystem is some extra
bits in the mount process and replacing PAGE_CACHE_* macros with
page_cache_*() wrapper functions.
We can probably set all this up trivially with XFS by allowing block size > page
size filesystems to be mounted and modifying the way we feed pages to a bio
to be aware of compound pages.
> Note that the higher order pages are subject to reclaim. This works in general
> since we are always operating on a single page struct. Reclaim is fooled to
> think that it is touching page sized objects (there are likely issues to be
> fixed there if we want to go down this road).
>
> What is currently not supported:
> - Buffer heads for higher order pages (possible with the compound pages in mm
> that do not use page->private requires upgrade of the buffer cache layers).
Does this mean that the -mm code will currently support bufferheads on compound
pages? We need that before we can get XFS to work with compound pages.
> - Higher order pages in the block layer etc.
It's more drivers that we have to worry about, I think. We don't need to
modify bios to explicitly support compound pages. From bio.h:
/*
* was unsigned short, but we might as well be ready for > 64kB I/O pages
*/
struct bio_vec {
struct page *bv_page;
unsigned int bv_len;
unsigned int bv_offset;
};
So compound pages should be transparent to anything that doesn't
look at the contents of bio_vecs....
> - Mmapping higher order pages
*nod*
hmmm - what about the way we do copyin and copyout from the page cache? ie
we kmap_atomic() them before we access them. Does this need to change?
> The ramfs driver can be used to test higher order page cache functionality
> (and may help troubleshoot the VM support until we get some real filesystem
> and real devices supporting higher order pages).
I don't think it will take much to get XFS to work with a high order
page cache and we can probably insulate the block layer initially with some
kind of bio_add_compound_page() wrapper and some similar
wrapper on the io completion side.
> Comments appreciated.
So far it's much less intrusive than I expected ;)
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists