lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 26 Apr 2007 13:23:25 +0100 (IST) From: Mel Gorman <mel@....ul.ie> To: "Eric W. Biederman" <ebiederm@...ssion.com> cc: Nick Piggin <nickpiggin@...oo.com.au>, David Chinner <dgc@....com>, clameter@....com, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, William Lee Irwin III <wli@...omorphy.com>, Jens Axboe <jens.axboe@...cle.com>, Badari Pulavarty <pbadari@...il.com>, Maxim Levitsky <maximlevitsky@...il.com> Subject: Re: [00/17] Large Blocksize Support V3 On Thu, 26 Apr 2007, Eric W. Biederman wrote: > mel@...net.ie (Mel Gorman) writes: > >> On (26/04/07 18:55), Nick Piggin didst pronounce: >>> Mel Gorman wrote: >>>> On (26/04/07 16:50), Nick Piggin didst pronounce: >>> >>>>> Fragmentation is the problem. The anti-frag patches don't actually >>>>> guarantee anything about fragmentation, and even if they did, then >>>> >>>> >>>> The grouping pages by mobility do not guarantee anything but the memory >>>> partition (kernelcore= boot parameter) does give hard guarantees about >>>> the amount of memory that is "movable". Of course, the partition requires >>>> configuration at boot-time so it's less than ideal but it does give hard >>>> guarantees. >>> >>> For the hugepages people, I can understand that's a solution. >> >> I think that's the closest you have ever come to saying fragmentation >> avoidance is not a terrible idea :) >> >>> But that's >>> the last thing you want to do on a system with a limited amount of memory, >>> or a regular Joe's desktop/server. >>> >> >> Regular Joe is not going to be creating a filesystem with large blocks or >> overly concerned with saturating all the disks hanging off his RAID array. >> At most, he'll care about faster DVD writing and considering the number and >> duration of those allocations, the system will be able to handle it. So I >> don't think Regular Joe will generally care. > > The practical question is if this a special purpose hack, or is > this general infrastructure that we expect filesystems to seriously > use. > > If this is a special purpose hack then the larger pages and > fragmentation are minor issues, but it's very existence is a problem. > > If this is not a special purpose hack and people do use this a lot > then the fragmentation is a problem. > Like so many other things, I think this would start as something used by the minority of users with the hardware that requires this sort of feature and slowly bubbles down until it appears on day-to-day machines. I wouldn't describe it as a hack but it's fair to say that it'll start as special purpose. It won't stay that way forever. > Your reply indicates that fragmentation is a concern, as does the > initial posting and the more positive threads. Therefore either > this is a special purpose hack in which case it's existence is > questionable or it fails a general solution and it's existence > is questionable. > Or it'll start as a special purpose feature and evolve to be a general solution. > >>>> Indeed but then you have to deal with internal fragmentation >>>> for pages-larger-than-TLB-page. I'm not saying it's wrong but it does >>>> come with it's own set of issues. >>> >>> None of them is perfect (the ways to increase the size of pagecache pages, >>> that is). >>> >>> I think in the long term, TLB page sizes will probably increase a little >>> bit... but if a given page size is "good enough" for a CPU, they really >>> should be good enough for other hardware. I mean, come on, the CPU's TLB >>> has to have a good hit ratio and handle several lookups per cycle with a >>> 3-cycle latency on 3GHz+ hardware... surely a an IO controller's >>> scatter-gather engine or IOMMU that has to do a few lookups per disk IO >>> is nowhere near so critical as a CPU's datapath: just add a few more >>> entries to it, they've already got hundreds of megs of cache, so that >>> isn't an issue either. >>> >> >> I cannot speak with authority on what IO controllers are really capable >> of so maybe someone else will comment on this more. > > Well here is some reality. Using an FPGA (slow by definition) > connected to a hypertransport interface I have exceeded a gigabyte a > second, without even setting up scatter gather. > > All of the I/O on all of the peripheral busses happens in sub > 1 page chunks. > > At the rates we are talking for disks the issue is really how to > bury the disk latency. For fast arrays how do you submit enough > request to bury the latency. > > I can't possibly believe any of this is about the cost of processing > a request, but rather the problem that some devices don't have a large > enough pool of requests to keep them busy if you submit the requests > in a 4K page sizes. > > This all sounds like a device design issue rather than anything more > significant, and it doesn't sound like a long term trend. Market > pressure should fix the hardware. > > > Eric > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists