linux-kernel - Re: [00/17] Large Blocksize Support V3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4630C326.2030906@yahoo.com.au>
Date:	Fri, 27 Apr 2007 01:20:06 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	David Chinner <dgc@....com>
CC:	Christoph Lameter <clameter@....com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	linux-kernel@...r.kernel.org, Mel Gorman <mel@...net.ie>,
	William Lee Irwin III <wli@...omorphy.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3

David Chinner wrote:
> On Thu, Apr 26, 2007 at 05:48:12PM +1000, Nick Piggin wrote:
> 
>>Christoph Lameter wrote:
>>
>>>On Thu, 26 Apr 2007, Nick Piggin wrote:
>>>
>>>
>>>
>>>>No I don't want to add another fs layer.
>>>
>>>
>>>Well maybe you could explain what you want. Preferably without redefining 
>>>the established terms?
>>
>>Support for larger buffers than page cache pages.
> 
> 
> The problem with this approach is that it turns around the whole
> way we look at bufferheads. Right now we have well defined 1:n
> mapping of page to bufferheads and so we tpyically lock the
> page first them iterate all the bufferheads on the page.
> 
> Going the other way, we need to support m:n which we means
> the buffer has to become the primary interface for the filesystem
> to the page cache. i.e. we need to lock the bufferhead first, then
> iterate all the pages on it. This is messy because the cache indexes
> via pages, not bufferheads. hence a buffer needs to point to all the
> pages in it explicitly, and this leads to interesting issues with
> locking.
> 
> If you still think that this is a good idea, I suggest that you

Yeah, I think it possibly is. But I don't think that much of the
vm should need to be touched at all, especially not mmap. And I
think only those filesystems interested in supporting it would
have to implement it (maybe a couple), not all of them.

I didn't run into all the issues you mentioned yet, and I don't
see why it would have to turn into a real (traditional) buffer
cache layer rather than just a Linux (ie. block mapping) buffer
layer.


>>So block size > page cache size... also, you should obviously be using
>>hardware that is tuned to work well with 4K pages, because surely there
>>is lots of that around.
> 
> 
> The CPU hardware works well with 4k pages, but in general I/O
> hardware works more efficiently as the numbers of s/g entries they
> require drops for a given I/O size. Given that we limit drivers to
> 128 s/g entries, we really aren't using I/O hardware to it's full
> potential or at it's most efficient by limiting each s/g entry to a
> single 4k page.
> 
> And FWIW, a having a buffer for block size > page size does not
> solve this problem - only contiguous page allocation solves this
> problem.

But this seems to be due to your final IO request size, rather than
the number of sg entries the card has to process, right? My point
is that MMUs and the whole memory and data transfer system is quite
likely to be optimised for 4K pages, so it would be strange to think
that an IO card could not do that.

Why do we limit drivers to 128 sg entries?

-- 
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/