[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <479716AD.5070708@qumranet.com>
Date: Wed, 23 Jan 2008 12:27:57 +0200
From: Avi Kivity <avi@...ranet.com>
To: Christoph Lameter <clameter@....com>
CC: Andrea Arcangeli <andrea@...ranet.com>,
Izik Eidus <izike@...ranet.com>, Andrew Morton <akpm@...l.org>,
Nick Piggin <npiggin@...e.de>, kvm-devel@...ts.sourceforge.net,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
steiner@....com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
daniel.blueman@...drics.com, holt@....com,
Hugh Dickins <hugh@...itas.com>
Subject: Re: [kvm-devel] [PATCH] export notifier #1
Christoph Lameter wrote:
> Ahhh. Good to hear. But we will still end in a situation where only
> the remote ptes point to the page. Maybe the remote instance will dirty
> the page at that point?
>
>
When the spte is dropped, its dirty bit is transferred to the page.
>
>> sharing code, and for you missing a single notifier means memory
>> corruption because you don't bump the page count to represent the
>> external reference).
>>
>
> The approach with the export notifier is page based not based on the
> mm_struct. We only need a single page count for a page that is exported to
> a number of remote instances of linux. The page count is dropped when all
> the remote instances have unmapped the page.
>
That won't work for kvm. If we have a hundred virtual machines, that
means 99 no-op notifications.
Also, our rmap key for finding the spte is keyed on (mm, va). I imagine
most RDMA cards are similar.
>
>
>>> @@ -966,6 +973,9 @@ int try_to_unmap(struct page *page, int
>>>
>>> BUG_ON(!PageLocked(page));
>>>
>>> + if (unlikely(PageExported(page)))
>>> + export_notifier(invalidate_page, page);
>>> +
>>>
>> Passing the page here will complicate things especially for shared
>> pages across different VM that are already working in KVM. For non
>>
>
> How?
>
>
>> shared pages we could cache the userland mapping address in
>> page->private but it's a kludge only working for non-shared
>> pages. Walking twice the anon_vma lists when only a single walk is
>>
>
> There is only the need to walk twice for pages that are marked Exported.
> And the double walk is only necessary if the exporter does not have its
> own rmap. The cross partition thing that we are doing has such an rmap and
> its a matter of walking the exporters rmap to clear out the external
> references and then we walk the local rmaps. All once.
>
The problem is that external mmus need a reverse mapping structure to
locate their ptes. We can't expand struct page so we need to base it on
mm + va.
>
>> Besides the pinned pages ram leak by having the zero locking window
>> above I'm curious how you are going to take care of the finegrined
>> aging that I'm doing with the accessed bit set by hardware in the spte
>>
>
> I think I explained that above. Remote users effectively are forbidden to
> establish new references to the page by the clearing of the exported bit.
>
>
Can they wait on that bit?
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists