linux-kernel - Re: [00/17] Large Blocksize Support V3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4630C776.9000804@yahoo.com.au>
Date:	Fri, 27 Apr 2007 01:38:30 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	David Chinner <dgc@....com>
CC:	"Eric W. Biederman" <ebiederm@...ssion.com>, clameter@....com,
	linux-kernel@...r.kernel.org, Mel Gorman <mel@...net.ie>,
	William Lee Irwin III <wli@...omorphy.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3

David Chinner wrote:
> On Thu, Apr 26, 2007 at 04:10:32AM -0600, Eric W. Biederman wrote:
>>Ok.  Now why are high end hardware manufacturers building crippled
>>hardware?  Or is there only an 8bit field in SCSI for describing
>>scatter gather entries?  Although I would think this would be
>>move of a controller ranter than a drive issue.
> 
> 
> scsi.h:
> 
> /*
>  *      The maximum sg list length SCSI can cope with
>  *      (currently must be a power of 2 between 32 and 256)
>  */
> #define SCSI_MAX_PHYS_SEGMENTS  MAX_PHYS_SEGMENTS
> 
> And from blkdev.h:
> 
> #define MAX_PHYS_SEGMENTS 128
> #define MAX_HW_SEGMENTS 128
> 
> So currentlt on SCSI we are limited to 128 s/g entries, and the
> maximum is 256.  So I'd say we've got good grounds for needing
> contiguous pages to go beyond 1MB I/O size on x86_64.

Or good grounds to increase the sg limit and push for io controller
manufacturers to do the same. If we have a hack in the kernel that
mostly works, they won't.

Page colouring was always rejected, and lots of people who knew
better got upset because it was the only way the hardware would go
fast...


>>>And what do we do for arches that can't do multiple page sizes, only
>>>only have a limited and mostly useless set of page sizes to choose
>>>from?
>>
>>You have HW_PAGE_SIZE != PAGE_SIZE.
> 
> 
> That's rather wasteful, though. Better to only use the large pages
> when the filesystem needs them rather than penalise all filesystems.

But 16k pages are fine for ia64. While you're talking about special
casing stuff, surely a bigger page size could be the config option
instead of higher order pagecache.


>>That is you hide the fact from
>>the bulk of the kernel struct page manges 2 or more real hardware pages.
>>But you expose it to the handful of places that actually care.
>>Partly this is a path you are starting down in your patches, with
>>larger page cache support.
> 
> 
> Right, exactly. So apart from the contiguous allocation issue, you think
> we are doing the right thing?

You could put it that way. Or that it is wrong because of the
fragmenatation problem. Realise that it is somewhat fundamental
considering that it is basically an unsolvable problem with our
current kernel assumptions of unconstrained kernel allocations and
a 1:1 kernel mapping.

-- 
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/