[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141031193932.GE38315@google.com>
Date: Fri, 31 Oct 2014 12:39:32 -0700
From: Peter Feiner <pfeiner@...gle.com>
To: zhanghailiang <zhang.zhanghailiang@...wei.com>
Cc: Andrea Arcangeli <aarcange@...hat.com>, qemu-devel@...gnu.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Andres Lagar-Cavilla <andreslc@...gle.com>,
Dave Hansen <dave@...1.net>,
Paolo Bonzini <pbonzini@...hat.com>,
Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
Andy Lutomirski <luto@...capital.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Sasha Levin <sasha.levin@...cle.com>,
Hugh Dickins <hughd@...gle.com>,
"Dr. David Alan Gilbert" <dgilbert@...hat.com>,
Christopher Covington <cov@...eaurora.org>,
Johannes Weiner <hannes@...xchg.org>,
Android Kernel Team <kernel-team@...roid.com>,
Robert Love <rlove@...gle.com>,
Dmitry Adamushko <dmitry.adamushko@...il.com>,
Neil Brown <neilb@...e.de>, Mike Hommey <mh@...ndium.org>,
Taras Glek <tglek@...illa.com>, Jan Kara <jack@...e.cz>,
KOSAKI Motohiro <kosaki.motohiro@...il.com>,
Michel Lespinasse <walken@...gle.com>,
Minchan Kim <minchan@...nel.org>,
Keith Packard <keithp@...thp.com>,
"Huangpeng (Peter)" <peter.huangpeng@...wei.com>,
Isaku Yamahata <yamahata@...inux.co.jp>,
Anthony Liguori <anthony@...emonkey.ws>,
Stefan Hajnoczi <stefanha@...il.com>,
Wenchao Xia <wenchaoqemu@...il.com>,
Andrew Jones <drjones@...hat.com>,
Juan Quintela <quintela@...hat.com>
Subject: Re: [PATCH 00/17] RFC: userfault v2
On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote:
> Agreed, but for doing live memory snapshot (VM is running when do snapsphot),
> we have to do this (block the write action), because we have to save the page before it
> is dirtied by writing action. This is the difference, compared to pre-copy migration.
Ah ha, I understand the difference now. I suppose that you have considered
doing a traditional pre-copy migration (that is, passes over memory saving
dirty pages, followed by a pause and a final dump of remaining dirty pages) to
a file. Your approach has the advantage of having the VM pause time bounded by
the time it takes to handle the userfault and do the write, as opposed to
pre-copy migration which has a pause time bounded by the time it takes to do
the final dump of dirty pages, which, in the worst case, is the time it takes
to dump all of the guest memory!
You could use the old fork & dump trick. Given that the guest's memory is
backed by private VMA (as of a year ago when I last looked, is always the case
for QEMU), you can have the kernel do the write protection for you.
Essentially, you fork Qemu and, in the child process, dump the guest memory
then exit. If the parent (including the guest) writes to guest memory, then it
will fault and the kernel will copy the page.
The fork & dump approach will give you the best performance w.r.t. guest pause
times (i.e., just pausing for the COW fault handler), but it does have the
distinct disadvantage of potentially using 2x the guest memory (i.e., if the
parent process races ahead and writes to all of the pages before you finish the
dump). To mitigate memory copying, you could madvise MADV_DONTNEED the child
memory as you copy it.
> Great! Do you plan to issue your patches to community? I mean is your work based on
> qemu? or an independent tool (CRIU migration?) for live-migration?
> Maybe i could fix the migration problem for ivshmem in qemu now,
> based on softdirty mechanism.
I absolutely plan on releasing these patches :-) CRIU was the first open-source
userland I had planned on integrating with. At Google, I'm working with our
home-grown Qemu replacement. However, I'd be happy to help with an effort to
get softdirty integrated in Qemu in the future.
> >Documentation/vm/soft-dirty.txt and pagemap.txt in case you aren't familiar. To
>
> I have read them cursorily, it is useful for pre-copy indeed. But it seems that
> it can not meet my need for snapshot.
> >make softdirty usable for live migration, I've added an API to atomically
> >test-and-clear the bit and write protect the page.
>
> How can i find the API? Is it been merged in kernel's master branch already?
Negative. I'll be sure to CC you when I start sending this stuff upstream.
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists