linux-kernel - Re: [00/17] Large Blocksize Support V3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 26 Apr 2007 15:37:28 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	clameter@....com, linux-kernel@...r.kernel.org,
	Mel Gorman <mel@...net.ie>,
	William Lee Irwin III <wli@...omorphy.com>,
	David Chinner <dgc@....com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3

Eric W. Biederman wrote:
> clameter@....com writes:
> 
> 
>>V2->V3
>>- More restructuring
>>- It actually works!
>>- Add XFS support
>>- Fix up UP support
>>- Work out the direct I/O issues
>>- Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert
>>  back to constants. Disabled for 32bit and HIGHMEM configurations.
>>  This also allows a gradual migration to the new page cache
>>  inline functions. LARGE_BLOCKSIZE capabilities can be
>>  added gradually and if there is a problem then we can disable
>>  a subsystem.
>>
>>V1->V2
>>- Some ext2 support
>>- Some block layer, fs layer support etc.
>>- Better page cache macros
>>- Use macros to clean up code.
>>
>>This patchset modifies the Linux kernel so that larger block sizes than
>>page size can be supported. Larger block sizes are handled by using
>>compound pages of an arbitrary order for the page cache instead of
>>single pages with order 0.
> 
> 
> Huh?
> 
> You seem to be mixing two very different concepts.
> 
> The page cache has no problems supporting things with a block
> size larger then page size.  Now the block device layer may not
> have the code to do the scatter gather into small pages and it
> may not handle buffer heads whose data is split between multiple
> pages. 

Yeah, this patch is not really large blocksize support (which we normally
think of as block size > page cache size).


> But this is not a page cache issue.
> 
> And generally larger physical pages are a mistake to use.
> Especially as it looks from some of the later comment you don't
> date test on 32bit because the memory fragments faster.

I actually completely agree with this, and I'm concerned in general about
using higher order pages. I think it is fundamentally the wrong approach
because of fragmentation and defragmentation costs (similarly to Linus's
take on page colouring).

I think starting with the assumption that we _want_ to use higher order
allocations, and then creating all this complexity around that is not a
good one, and if we start introducing things that _require_ significant
higher order allocations to function then it is a nasty thing for
robustness.


> Is it common for hardware that supports large block sizes to not
> support splitting those blocks apart during DMA?  Unless it is common
> the whole premise of this patchset seems broken.
> 
> I suspect what needs to be fixed is the page cache block device
> interface so that we have helper functions that know how to stuff
> a single block into several pages.

I am working now and again on some code to do this, it is a big job but
I think it is the right way to do it. But it would take a long time to
get stable and supported by filesystems...


> That would make the choice of using larger order pages (essentially
> increasing PAGE_SIZE) something that can be investigated in parallel.

I agree that hardware inefficiencies should be handled by increasing
PAGE_SIZE (not making PAGE_CACHE_SIZE > PAGE_SIZE) at the arch level.

-- 
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/