linux-kernel - Re: [00/17] Large Blocksize Support V3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1mz0vkoux.fsf@ebiederm.dsl.xmission.com>
Date:	Thu, 26 Apr 2007 11:49:26 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Nick Piggin <nickpiggin@...oo.com.au>, David Chinner <dgc@....com>,
	clameter@....com, linux-kernel@...r.kernel.org,
	Mel Gorman <mel@...net.ie>,
	William Lee Irwin III <wli@...omorphy.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3

Christoph Hellwig <hch@...radead.org> writes:

> On Thu, Apr 26, 2007 at 04:50:06PM +1000, Nick Piggin wrote:
>> Improving the buffer layer would be a good way. Of course, that is
>> a long and difficult task, so nobody wants to do it.
>
> It's also a stupid idea.  We got rid of the buffer layer because it's
> a complete pain in the ass, and now you want to reintroduce one that's
> even more complex, and most likely even slower than the elegant solution?

No. I'm really suggesting improving the translation from BIO's 
to the page cache.  A set of helper functions.

This patch is suggesting we move to a BSD like buffer cache, except
that everything is physically mapped.

My most practical suggestion is to have support code so that you can
do all of the locking (that I/O cares about) on the first page of a
page group in the page cache.  You don't need larger physical pages to
do that.

>> Well, for those architectures (and this would solve your large block
>> size and 16TB pagecache size without any core kernel changes), you
>> can manage 1<<order hardware ptes as a single Linux pte. There is
>> nothing that says you must implement PAGE_SIZE as a single TLB sized
>> page.
>
> Well, ppc64 can do that.  And guess what, it really painfull for a lot
> of workloads.  Think of a poor ps3 with 256 from which the broken hypervisor
> already takes a lot away and now every file in the pagecache takes
> 64k, every thread stack takes 64k, etc?  It's good to have variable
> sized objects in places where it makes sense, and the pagecache is
> definitively one of them.

Agreed the page cache is all about variable sized objects known as files!
You don't need to do anything extra.  The problem is only with building
I/O requests from what is there.

Iff we really the larger physical page size to support the hardware
then it makes sense to go down a path of larger pages.  But it doesn't.

There is also a more fundamental reasons this patch is silly.  It assumes
that there is a trivial mapping between filesystems (the primary target
of the page cache and blocks on disk).  Now I admit this is common but
there is no reason to supposed it is true, and this patch appears to
exacerbate things.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/