[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4600D5EB.90507@yahoo.com.au>
Date: Wed, 21 Mar 2007 17:51:23 +1100
From: Nick Piggin <nickpiggin@...oo.com.au>
To: William Lee Irwin III <wli@...omorphy.com>
CC: Adam Litke <agl@...ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Christoph Hellwig <hch@...radead.org>,
Ken Chen <kenchen@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 1/7] Introduce the pagetable_operations and associated
helper macros.
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>ISTR potential ppc64 users coming out of the woodwork for something I
>>>didn't recognize the name of, but I may be confusing that with your
>>>patch. I can implement additional users (and useful ones at that)
>>>needing this in particular if desired.
>
>
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
>
>>Yes I would be interested in seeing useful additional users of this
>>that cannot use our regular virtual memory, before making it a general
>>thing.
>>I just don't want to see proliferation of these things, if possible.
>
>
> I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
> in a few weeks I can start up on the first two of the bunch.
Care to give us a hint? :)
> William Lee Irwin III wrote:
>
>>>Two fault handling methods callbacks raise an eyebrow over here at least.
>>>I was vaguely hoping for unification of the fault handling callbacks.
>
>
> On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
>
>>I don't know if it would be so clean to do that as they are at different
>>levels.
>>Adam's fault is before the VM translation (and bypasses it), and mine is
>>after.
>
>
> Not much of a VM translation; it's just a lookup through the software
> mocked-up structures on everything save i386, x86_64, and some m68k where
> they're the same thing only with hardware walkers (ISTR ia64's being
> firmware a la Alpha despite the "HPW" name, though I could be wrong)
Well the vma+pagetables *are* our VM translation data structure. It is
a good data structure. The Gelato/UNSW guys experimenting with changing
this have basically said they haven't yet got anything that beats it.
I would be opposed to anything that bypasses that unless a) it is not
applicable to the VM as a whole, and b) it is really worth it
(hugepages was a reasonable exception).
> reliant on them. The drivers/etc. could just as easily use helper
> functions to carry out the lookup, thereby accomplishing the
> unification. There's nothing particularly fundamental about a pte
> lookup.
Yeah you could, but it looks back to front to me.
The VM tells the filesystem that the machine took a fault at virtual
address X, then the filesystem asks the VM what pgoff that is, then
tells the VM to install the corresponding page to vaddr X.
With my ->fault, the VM asks the filesystem to give the page that
corresponds to vaddr X, then installs it into that vaddr.
> Normal arches that do software TLB refill could just as easily
> consult the radix trees dangled off struct address_space or any old
> data structure floating around the kernel with enough information to
> translate user virtual addresses to the physical addresses they need to
> fill the TLB with, and there are other kernels that literally do things
> like that.
Sure it *could* be done, but it may not be very nice, given Linux's
design. And you definitely need _something_ other than just the
pagecache radix-tree, because the VM needs to know who maps the page.
So if, for your backing store, you use a small hash table and evict old
entries like powerpc, you'll constantly be faulting in and out pages
from the VM's high level view of the address space. That isn't a really
cheap operation. It takes at least:
read_lock_irq(mapping->tree_lock);
radix_tree_lookup()
read_unlock_irq(mapping->tree_lock);
lock_page()
atomic_add(page->_count)
atomic_add(page->_mapcount)
unlock_page()
atomic_add_negative(page->_mapcount)
atomic_dec_and_test(page->_count)
Compared to our current page table walk which is just a single locked
op + barrier for the spinlock + radix tree walk.
If you had a very large hash table (ia64 long mode, maybe?), then you
may have slightly fewer high level faults, but range based operations
are going to take a whole lot of cache misses, aren't they? Especially
for small processes.
Not that I wouldn't be happy to be proven wrong, but I don't think it
should be something that sneaks in under these pagetable operations.
IMO.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists