linux-kernel - Re: [RFC 0/8] Variable Order Page Cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070419224225.GJ32602149@melbourne.sgi.com>
Date:	Fri, 20 Apr 2007 08:42:25 +1000
From:	David Chinner <dgc@....com>
To:	Christoph Lameter <clameter@....com>
Cc:	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Paul Jackson <pj@....com>, Dave Chinner <dgc@....com>,
	Andi Kleen <ak@...e.de>
Subject: Re: [RFC 0/8] Variable Order Page Cache

On Thu, Apr 19, 2007 at 09:35:04AM -0700, Christoph Lameter wrote:
> Variable Order Page Cache Patchset
> 
> This patchset modifies the core VM so that higher order page cache pages
> become possible. The higher order page cache pages are compound pages
> and can be handled in the same way as regular pages.
> 
> The order of the pages is determined by the order set up in the mapping
> (struct address_space). By default the order is set to zero.
> This means that higher order pages are optional. There is no attempt here
> to generally change the page order of the page cache. 4K pages are effective
> for small files.
> 
> However, it would be good if the VM would support I/O to higher order pages
> to enable efficient support for large scale I/O. If one wants to write a
> long file of a few gigabytes then the filesystem should have a choice of
> selecting a larger page size for that file and handle larger chunks of
> memory at once.
> 
> The support here is only for buffered I/O and only for one filesystem (ramfs).
> Modification of other filesystems to support higher order pages may require
> extensive work of other components of the kernel. But I hope this shows that
> there is a relatively easy way to that goal that could be taken in steps..

So looking at this the main thing for converting a filesystem is some extra
bits in the mount process and replacing PAGE_CACHE_* macros with
page_cache_*() wrapper functions.

We can probably set all this up trivially with XFS by allowing block size > page
size filesystems to be mounted and modifying the way we feed pages to a bio
to be aware of compound pages.

> Note that the higher order pages are subject to reclaim. This works in general
> since we are always operating on a single page struct. Reclaim is fooled to
> think that it is touching page sized objects (there are likely issues to be
> fixed there if we want to go down this road).
> 
> What is currently not supported:
> - Buffer heads for higher order pages (possible with the compound pages in mm
>   that do not use page->private requires upgrade of the buffer cache layers).

Does this mean that the -mm code will currently support bufferheads on compound
pages? We need that before we can get XFS to work with compound pages.

> - Higher order pages in the block layer etc.

It's more drivers that we have to worry about, I think.  We don't need to
modify bios to explicitly support compound pages. From bio.h:

/*
 * was unsigned short, but we might as well be ready for > 64kB I/O pages
 */
struct bio_vec {
        struct page     *bv_page;
        unsigned int    bv_len;
        unsigned int    bv_offset;
};

So compound pages should be transparent to anything that doesn't
look at the contents of bio_vecs....

> - Mmapping higher order pages

*nod*

hmmm - what about the way we do copyin and copyout from the page cache? ie
we kmap_atomic() them before we access them. Does this need to change?

> The ramfs driver can be used to test higher order page cache functionality
> (and may help troubleshoot the VM support until we get some real filesystem
> and real devices supporting higher order pages).

I don't think it will take much to get XFS to work with a high order
page cache and we can probably insulate the block layer initially with some
kind of bio_add_compound_page() wrapper and some similar
wrapper on the io completion side.

> Comments appreciated.

So far it's much less intrusive than I expected ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/