[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080123105246.GG26420@sgi.com>
Date: Wed, 23 Jan 2008 04:52:47 -0600
From: Robin Holt <holt@....com>
To: Avi Kivity <avi@...ranet.com>
Cc: Christoph Lameter <clameter@....com>,
Andrea Arcangeli <andrea@...ranet.com>,
Izik Eidus <izike@...ranet.com>, Andrew Morton <akpm@...l.org>,
Nick Piggin <npiggin@...e.de>, kvm-devel@...ts.sourceforge.net,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
steiner@....com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
daniel.blueman@...drics.com, holt@....com,
Hugh Dickins <hugh@...itas.com>
Subject: Re: [kvm-devel] [PATCH] export notifier #1
On Wed, Jan 23, 2008 at 12:27:57PM +0200, Avi Kivity wrote:
>> The approach with the export notifier is page based not based on the
>> mm_struct. We only need a single page count for a page that is exported to
>> a number of remote instances of linux. The page count is dropped when all
>> the remote instances have unmapped the page.
>
> That won't work for kvm. If we have a hundred virtual machines, that means
> 99 no-op notifications.
But 100 callouts holding spinlocks will not work for our implementation
and even if the callouts are made with spinlocks released, we would very
strongly prefer a single callout which messages the range to the other
side.
> Also, our rmap key for finding the spte is keyed on (mm, va). I imagine
> most RDMA cards are similar.
For our RDMA rmap, it is based upon physical address.
>> There is only the need to walk twice for pages that are marked Exported.
>> And the double walk is only necessary if the exporter does not have its
>> own rmap. The cross partition thing that we are doing has such an rmap and
>> its a matter of walking the exporters rmap to clear out the external
>> references and then we walk the local rmaps. All once.
>>
>
> The problem is that external mmus need a reverse mapping structure to
> locate their ptes. We can't expand struct page so we need to base it on mm
> + va.
Our rmap takes a physical address and turns it into mm+va.
> Can they wait on that bit?
PageLocked(page) should work, right? We already have a backoff
mechanism so we expect to be able to adapt it to include a
PageLocked(page) check.
Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists