[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4631E173.7000204@yahoo.com.au>
Date: Fri, 27 Apr 2007 21:41:39 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Paul Mackerras <paulus@...ba.org>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <clameter@....com>,
David Chinner <dgc@....com>, linux-kernel@...r.kernel.org,
Mel Gorman <mel@...net.ie>,
William Lee Irwin III <wli@...omorphy.com>,
Jens Axboe <jens.axboe@...cle.com>,
Badari Pulavarty <pbadari@...il.com>,
Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3
Paul Mackerras wrote:
> Andrew Morton writes:
>
>
>>If x86 had larger pagesize we wouldn't be seeing any of this. It is a workaround
>>for present-generation hardware.
>
>
> Unfortunately, it's not really practical to increase the page size
> very much on most systems, because you end up wasting a lot of space
> in the page cache. So there is a tension between wanting a small page
> size so your page cache uses memory efficiently, and wanting a large
> page size so the TLB covers more address space and your programs run
> faster (not to mention other benefits such as the kernel having to
> manage fewer pages, and I/O being done in bigger chunks).
>
> Thus there is not really any single page size that suits all workloads
> and machines. With distros wanting to just have a single kernel per
> architecture, and the fact that the page size is a compile-time
> constant, we currently end up having to pick one size and just put up
> with the fact that it will suck for some users. We currently have
> this situation on ppc64 now that POWER5+ and POWER6 machines have
> hardware support for 64k pages as well as 4k pages.
>
> So I can see a few different options:
>
> (a) Keep things more or less as they are now and just wear the fact
> that we will continue to show lower performance than certain
> proprietary OSes, or
>
> (b) Somehow manage to make the page size a variable rather than a
> compile-time constant, and pick a suitable page size at boot time
> based on how much memory the machine has, or something. I looked at
> implementing this at one point and recoiled in horror. :)
>
> (c) Make the page cache able to use small pages for small files and
> large pages for large files. AIUI this is basically what Christoph is
> proposing.
>
> Option (a) isn't very palatable to me (nor I expect, Christoph :)
> since it basically says that Linux is very much focussed on the
> embedded and desktop end of things and isn't really suitable as a
> high-performance OS for large SMP systems. I don't want to believe
> that. ;)
>
> Option (b) would be a bit of an ugly hack.
>
> Which leaves option (c) - unless you have a further option. So I have
> to say I support Christoph on this, at least as far as the general
> principle is concerned.
For the TLB issue, higher order pagecache doesn't help. If distros
ship with a 4K page size on powerpc, and use some larger pages in
the pagecache, some people are still going to get angry because
they wanted to use 64K pages... But I agree 64K pages is too big
for most things anyway, and 16 would be better as a default (which
hopefully x86-64 will get one day).
Anyway, for io performance, there are alternatives, dispite what
some people seem to be saying. We can submit larger sglists to the
device for larger ios, which Jens is looking at (which could help
all types of workloads, not just those with sequential large file
IO).
After that, I'd find it amusing if HBAs worth thousands of $ have
trouble looking up sglists at the relatively glacial pace that IO
requires, and/or can't spare a few more K for reasonable sglist
sizes, but if that is really the case, then we could use iommus
and/or just attempt to put physically contiguous pages in pagecache,
rather than require it.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists