linux-kernel - Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UdM5NGOARoiCNvh3Hu0xyvfJ-VoRqDu8bg6RyupSCEYHw@mail.gmail.com>
Date:	Wed, 25 Nov 2015 08:24:38 -0800
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	"Lan, Tianyu" <tianyu.lan@...el.com>
Cc:	"Michael S. Tsirkin" <mst@...hat.com>,
	a.motakis@...tualopensystems.com,
	Alex Williamson <alex.williamson@...hat.com>,
	b.reynal@...tualopensystems.com,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Carolyn Wyborny <carolyn.wyborny@...el.com>,
	"Skidmore, Donald C" <donald.c.skidmore@...el.com>,
	eddie.dong@...el.com, nrupal.jani@...el.com,
	Alexander Graf <agraf@...e.de>, kvm@...r.kernel.org,
	Paolo Bonzini <pbonzini@...hat.com>, qemu-devel@...gnu.org,
	"Tantilov, Emil S" <emil.s.tantilov@...el.com>,
	Or Gerlitz <gerlitz.or@...il.com>,
	"Rustad, Mark D" <mark.d.rustad@...el.com>,
	Eric Auger <eric.auger@...aro.org>,
	intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	"Ronciak, John" <john.ronciak@...el.com>,
	linux-api@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Vick, Matthew" <matthew.vick@...el.com>,
	Mitch Williams <mitch.a.williams@...el.com>,
	Netdev <netdev@...r.kernel.org>,
	"Nelson, Shannon" <shannon.nelson@...el.com>,
	Wei Yang <weiyang@...ux.vnet.ibm.com>, zajec5@...il.com
Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu <tianyu.lan@...el.com> wrote:
> On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote:
>>
>> Frankly, I don't really see what this short term hack buys us,
>> and if it goes in, we'll have to maintain it forever.
>>
>
> The framework of how to notify VF about migration status won't be
> changed regardless of stopping VF or not before doing migration.
> We hope to reach agreement on this first. Tracking dirty memory still
> need to more discussions and we will continue working on it. Stop VF may
> help to work around the issue and make tracking easier.

The problem is you still have to stop the device at some point for the
same reason why you have to halt the VM.  You seem to think you can
get by without doing that but you can't.  All you do is open the
system up to multiple races if you leave the device running.  The goal
should be to avoid stopping the device until the last possible moment,
however it will still have to be stopped eventually.  It isn't as if
you can migrate memory and leave the device doing DMA and expect to
get a clean state.

I agree with Michael.  The focus needs to be on first addressing dirty
page tracking.  Once you have that you could use a variation on the
bonding solution where you postpone the hot-plug event until near the
end of the migration just before you halt the guest instead of having
to do it before you start the migration.  Then after that we could
look at optimizing things further by introducing a variation that you
could further improve on things by introducing a variation of hot-plug
that would pause the device as I suggested instead of removing it.  At
that point you should be able to have almost all of the key issues
addresses so that you could drop the bond interface entirely.

>> Also, assuming you just want to do ifdown/ifup for some reason, it's
>> easy enough to do using a guest agent, in a completely generic way.
>>
>
> Just ifdown/ifup is not enough for migration. It needs to restore some PCI
> settings before doing ifup on the target machine

That is why I have been suggesting making use of suspend/resume logic
that is already in place for PCI power management.  In the case of a
suspend/resume we already have to deal with the fact that the device
will go through a D0->D3->D0 reset so we have to restore all of the
existing state.  It would take a significant load off of Qemu since
the guest would be restoring its own state instead of making Qemu have
to do all of the device migration work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/