[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1mz0vkoux.fsf@ebiederm.dsl.xmission.com>
Date: Thu, 26 Apr 2007 11:49:26 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Christoph Hellwig <hch@...radead.org>
Cc: Nick Piggin <nickpiggin@...oo.com.au>, David Chinner <dgc@....com>,
clameter@....com, linux-kernel@...r.kernel.org,
Mel Gorman <mel@...net.ie>,
William Lee Irwin III <wli@...omorphy.com>,
Jens Axboe <jens.axboe@...cle.com>,
Badari Pulavarty <pbadari@...il.com>,
Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3
Christoph Hellwig <hch@...radead.org> writes:
> On Thu, Apr 26, 2007 at 04:50:06PM +1000, Nick Piggin wrote:
>> Improving the buffer layer would be a good way. Of course, that is
>> a long and difficult task, so nobody wants to do it.
>
> It's also a stupid idea. We got rid of the buffer layer because it's
> a complete pain in the ass, and now you want to reintroduce one that's
> even more complex, and most likely even slower than the elegant solution?
No. I'm really suggesting improving the translation from BIO's
to the page cache. A set of helper functions.
This patch is suggesting we move to a BSD like buffer cache, except
that everything is physically mapped.
My most practical suggestion is to have support code so that you can
do all of the locking (that I/O cares about) on the first page of a
page group in the page cache. You don't need larger physical pages to
do that.
>> Well, for those architectures (and this would solve your large block
>> size and 16TB pagecache size without any core kernel changes), you
>> can manage 1<<order hardware ptes as a single Linux pte. There is
>> nothing that says you must implement PAGE_SIZE as a single TLB sized
>> page.
>
> Well, ppc64 can do that. And guess what, it really painfull for a lot
> of workloads. Think of a poor ps3 with 256 from which the broken hypervisor
> already takes a lot away and now every file in the pagecache takes
> 64k, every thread stack takes 64k, etc? It's good to have variable
> sized objects in places where it makes sense, and the pagecache is
> definitively one of them.
Agreed the page cache is all about variable sized objects known as files!
You don't need to do anything extra. The problem is only with building
I/O requests from what is there.
Iff we really the larger physical page size to support the hardware
then it makes sense to go down a path of larger pages. But it doesn't.
There is also a more fundamental reasons this patch is silly. It assumes
that there is a trivial mapping between filesystems (the primary target
of the page cache and blocks on disk). Now I admit this is common but
there is no reason to supposed it is true, and this patch appears to
exacerbate things.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists