linux-kernel - Re: [00/17] Large Blocksize Support V3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0704261321210.1906@skynet.skynet.ie>
Date:	Thu, 26 Apr 2007 13:23:25 +0100 (IST)
From:	Mel Gorman <mel@....ul.ie>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
cc:	Nick Piggin <nickpiggin@...oo.com.au>, David Chinner <dgc@....com>,
	clameter@....com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	William Lee Irwin III <wli@...omorphy.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3

On Thu, 26 Apr 2007, Eric W. Biederman wrote:

> mel@...net.ie (Mel Gorman) writes:
>
>> On (26/04/07 18:55), Nick Piggin didst pronounce:
>>> Mel Gorman wrote:
>>>> On (26/04/07 16:50), Nick Piggin didst pronounce:
>>>
>>>>> Fragmentation is the problem. The anti-frag patches don't actually
>>>>> guarantee anything about fragmentation, and even if they did, then
>>>>
>>>>
>>>> The grouping pages by mobility do not guarantee anything but the memory
>>>> partition (kernelcore= boot parameter) does give hard guarantees about
>>>> the amount of memory that is "movable". Of course, the partition requires
>>>> configuration at boot-time so it's less than ideal but it does give hard
>>>> guarantees.
>>>
>>> For the hugepages people, I can understand that's a solution.
>>
>> I think that's the closest you have ever come to saying fragmentation
>> avoidance is not a terrible idea :)
>>
>>> But that's
>>> the last thing you want to do on a system with a limited amount of memory,
>>> or a regular Joe's desktop/server.
>>>
>>
>> Regular Joe is not going to be creating a filesystem with large blocks or
>> overly concerned with saturating all the disks hanging off his RAID array.
>> At most, he'll care about faster DVD writing and considering the number and
>> duration of those allocations, the system will be able to handle it. So I
>> don't think Regular Joe will generally care.
>
> The practical question is if this a special purpose hack, or is
> this general infrastructure that we expect filesystems to seriously
> use.
>
> If this is a special purpose hack then the larger pages and
> fragmentation are minor issues, but it's very existence is a problem.
>
> If this is not a special purpose hack and people do use this a lot
> then the fragmentation is a problem.
>

Like so many other things, I think this would start as something used by 
the minority of users with the hardware that requires this sort of feature 
and slowly bubbles down until it appears on day-to-day machines. I 
wouldn't describe it as a hack but it's fair to say that it'll start as 
special purpose. It won't stay that way forever.

> Your reply indicates that fragmentation is a concern, as does the
> initial posting and the more positive threads.  Therefore either
> this is a special purpose hack in which case it's existence is
> questionable or it fails a general solution and it's existence
> is questionable.
>

Or it'll start as a special purpose feature and evolve to be a general 
solution.

>
>>>> Indeed but then you have to deal with internal fragmentation
>>>> for pages-larger-than-TLB-page. I'm not saying it's wrong but it does
>>>> come with it's own set of issues.
>>>
>>> None of them is perfect (the ways to increase the size of pagecache pages,
>>> that is).
>>>
>>> I think in the long term, TLB page sizes will probably increase a little
>>> bit... but if a given page size is "good enough" for a CPU, they really
>>> should be good enough for other hardware. I mean, come on, the CPU's TLB
>>> has to have a good hit ratio and handle several lookups per cycle with a
>>> 3-cycle latency on 3GHz+ hardware... surely a an IO controller's
>>> scatter-gather engine or IOMMU that has to do a few lookups per disk IO
>>> is nowhere near so critical as a CPU's datapath: just add a few more
>>> entries to it, they've already got hundreds of megs of cache, so that
>>> isn't an issue either.
>>>
>>
>> I cannot speak with authority on what IO controllers are really capable
>> of so maybe someone else will comment on this more.
>
> Well here is some reality.  Using an FPGA (slow by definition)
> connected to a hypertransport interface I have exceeded a gigabyte a
> second, without even setting up scatter gather.
>
> All of the I/O on all of the peripheral busses happens in sub
> 1 page chunks.
>
> At the rates we are talking for disks the issue is really how to
> bury the disk latency.  For fast arrays how do you submit enough
> request to bury the latency.
>
> I can't possibly believe any of this is about the cost of processing
> a request, but rather the problem that some devices don't have a large
> enough pool of requests to keep them busy if you submit the requests
> in a 4K page sizes.
>
> This all sounds like a device design issue rather than anything more
> significant, and it doesn't sound like a long term trend.  Market
> pressure should fix the hardware.
>
>
> Eric
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/